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ARRAYS OF NUCLEIC ACID PROBES ON BIOLOGICAL CHIPS 

5 Cross-Reference to Related Application 

This application is a continuation-in-part of USSN 
08/284 , 064, filed August 2, 1994, which is a continuation-in- 
part of USSN 08/143,312, filed October 26, 1993, each of which 
is incorporated by reference in its entirety for all purposes. 
10 Research leading to the invention was funded in part by NIH 

grant No. 1R01HG00813-01, and the government may have certain 
rights to the invention. 

Background of the Invention 
15 Field of the Invention 

The present invention provides arrays of oligonucleotide 
probes immobilized in microf abricated patterns on silica chips 
for analyzing molecular interactions of biological interest. 
The invention therefore relates to diverse fields impacted by 
20 the nature of molecular interaction, including chemistry, 
biology, medicine, and medical diagnostics. 

Description of Related Art 

Oligonucleotide probes have long been used to detect 

25 complementary nucleic acid sequences in a nucleic acid of 

interest (the "target" nucleic acid) . In some assay formats, 
the oligonucleotide probe is tethered, i.e., by covalent 
attachment, to a solid support, and arrays of oligonucleotide 
probes immobilized on solid supports have been used to detect 

30 specific nucleic acid sequences in a target nucleic acid. 
See, e.g., PCT patent publication Nos. WO 89/10977 and 
89/11548. Others have proposed the use of large numbers of 
oligonucleotide probes to provide the complete nucleic acid 
sequence of a target nucleic acid but failed to provide an 

35 enabling method for using arrays of immobilized probes for 
this purpose. See U.S. Patent Nos. 5,202,231 and 5,002,867 
and PCT patent publication No. WO 93/17126. 
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The development of VLSIPS™ technology has provided 
methods for making very large arrays of oligonucleotide probes 
in very small arrays. See U.S. Patent No. 5,143,854 and PCT 
patent publication Nos. WO 90/15070 and 92/10092, each of 
5 which is incorporated herein by reference. U.S. Patent 

application Serial No. 082,937, filed June 25, 1993, describes 
methods for making arrays of oligonucleotide probes that can 
be used to provide the complete sequence of a target nucleic 
acid and to detect the presence of a nucleic acid containing a 

10 specific nucleotide sequence. 

Microf abricated arrays of large numbers of 
oligonucleotide probes, called "DNA chips" offer great promise 
for a wide variety of applications. New methods and reagents 
are required to realize this promise, and the present 

15 invention helps meet that need. 

SUMMARY OF THE INVENTION 
The invention provides several strategies employing 
immobilized arrays of probes for comparing a reference 
sequence of known sequence with a target sequence showing 

20 substantial similarity with the reference sequence, but 

differing in the presence of, e.g., mutations. In a first 
embodiment, the invention provides a tiling strategy employing 
an array of immobilized oligonucleotide probes comprising at 
least two sets of probes. A first probe set comprises a 

25 plurality of probes, each probe comprising a segment of at 

least three nucleotides exactly complementary to a subsequence 
of the reference sequence, the segment including at least one 
interrogation position complementary to a corresponding 
nucleotide in the reference sequence. A second probe set 

30 comprises a corresponding probe for each probe in the first 
probe set, the corresponding probe in the second probe set 
being identical to a sequence comprising the corresponding 
probe from the first probe set or a subsequence of at least 
three nucleotides thereof that includes the at least one 

35 interrogation position, except that the at least one 

interrogation position is occupied by a different nucleotide 
in each of the two corresponding probes from the first and 
second probe sets. The probes in the first probe set have at 
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least two interrogation positions corresponding to two 
contiguous nucleotides in the reference sequence. One 
interrogation position corresponds to one of the contiguous 
nucleotides, and the other interrogation position to the 
5 other . 

In a second embodiment, the invention provides a tiling 
strategy employing an array comprising four probe sets. A 
first probe set comprises a plurality of probes, each probe 
comprising a segment of at least three nucleotides exactly 

10 complementary to a subsequence of the reference sequence, the 
segment including at least one interrogation position 
complementary to a corresponding nucleotide in the reference 
sequence. Second, third and fourth probe sets each comprise a 
corresponding probe for each probe in the first probe set. 

15 The probes in the second, third and fourth probe sets are 
identical to a sequence comprising the corresponding probe 
from the first probe set or a subsequence of at least three 
nucleotides thereof that includes the at least one 
interrogation position, except that the at least one 

20 interrogation position is occupied by a different nucleotide 
in each of the four corresponding probes from the four probe 
sets. The first probe set often has at least 100 
interrogation positions corresponding to 100 contiguous 
nucleotides in the reference sequence. Sometimes the first 

25 probe set has an interrogation position corresponding to every 
nucleotide in the reference sequence. The segment of 
complementarity within the probe set is usually about 9-21 
nucleotides. Although probes may contain leading or trailing 
sequences in addition to the 9-21 sequences, many probes 

30 consist exclusively of a 9-21 segment of complementarity. 

In a third embodiment, the invention provides immobilized 
arrays of probes tiled for multiple reference sequences. One 
such array comprises at least one pair of first and second 
probe groups, each group comprising first and second sets of 

35 probes as defined in the first embodiment. Each probe in the 
first probe set from the first group is exactly complementary 
to a subsequence of a first reference sequence, and each probe 
in the first probe set from the second group is exactly 
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complementary to a subsequence of a second reference sequence. 
Thus, the first group of probes .are tiled with respect to a 
first reference sequence and the second group of probes with 
respect to a second reference sequence. Each group of probes 
5 can also include third and fourth sets of probes as defined in 
the second embodiment. In some arrays of this type, the 
second reference sequence is a mutated form of the first 
reference sequence. 

In a fourth embodiment, the invention provides arrays for 

10 block tiling. Block tiling is a species of the general tiling 
strategies described above. The usual unit of a block tiling 
array is a group of probes comprising a wildtype probe, a 
first set of three mutant probes and a second set of three 
mutant probes. The wildtype probe comprises a segment of at 

15 least three nucleotides exactly complementary to a subsequence 
of a reference sequence. The segment has at least first and 
second interrogation positions corresponding to first and 
second nucleotides in the reference sequence. The probes in 
the first set of three mutant probes are each identical to a 

20 sequence comprising the wildtype probe or a subsequence of at 
least three nucleotides thereof including the first and second 
interrogation positions, except in the first interrogation 
position, which is occupied by a different nucleotide in each 
of the three mutant probes and the wildtype probe. The probes 

25 in the second set of three mutant probes are each identical to 
a sequence comprising the wildtype probes or a subsequence of 
at least three nucleotides thereof including the first and 
second interrogation positions, except in the second 
interrogation position, which is occupied by a different 

30 nucleotide in each of the three mutant probes and the wildtype 
probe . 

In a fifth embodiment, the invention provides methods of 
comparing a target sequence with a reference sequence using 
arrays of immobilized pooled probes. The arrays employed in 
3 5 these methods represent a further species of the general 

tiling arrays noted above. In these methods, variants of a 
reference sequence differing from the reference sequence in at 
least one nucleotide are identified and each is assigned a 
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designation. An array of pooled probes is provided, with each 
pool occupying a separate cell of the array. Each pool 
comprises a probe comprising a segment exactly complementary 
to each variant sequence assigned a particular designation. 
The array is then contacted with a target sequence comprising 
a variant of the reference sequence. The relative 
hybridization intensities of the pools in the array to the 
target sequence are determined. The identity of the target 
sequence is deduced from the pattern of hybridization 
intensities. Often, each variant is assigned a designation 
having at least one digit and at least one value for the 
digit. In this case, each pool comprises a probe comprising a 
segment exactly complementary to each variant sequence 
assigned a particular value in a particular digit. When 
variants are assigned successive numbers in a numbering system 
of base m having n digits, n x (m-1) pooled probes are used 
are used to assign each variant a designation. 

In a sixth embodiment, the invention provides a pooled 
probe for trellis tiling, a further species of the general 
tiling strategy. In trellis tiling, the identity of a 
nucleotide in a target sequence is determined from a 
comparison of hybridization intensities of three pooled 
trellis probes. A pooled trellis probe comprises a segment 
exactly complementary to a subsequence of a reference sequence 
except at a first interrogation position occupied by a pooled 
nucleotide N, a second interrogation position occupied by a 
pooled nucleotide selected from the group of three consisting 
of (1) M or K, (2) R or Y and (3) S or W, and a third 
interrogation position occupied by a second pooled nucleotide 
selected from the group. The pooled nucleotide occupying the 
second interrogation position comprises a nucleotide 
complementary to a corresponding nucleotide from the reference 
sequence when the second pooled probe and reference sequence 
are maximally aligned, and the pooled nucleotide occupying the 
third interrogation position comprises a nucleotide 
complementary to a corresponding nucleotide from the reference 
sequence when the third pooled probe and the reference 
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sequence are maximally aligned. Standard IUPAC nomenclature 
is used for describing pooled nucleotides. 

In trellis tiling, an array comprises at least first, 
second and third cells, respectively occupied by first, second 
5 and third pooled probes, each according to the generic 

description above. However, the segment of complementarity, 
location of interrogation positions, and selection of pooled 
nucleotide at each interrogation position may or may not 
differ between the three pooled probes subject to the 

10 following constraint. One of the three interrogation 

positions in each of the three pooled probes must align with 
the same corresponding nucleotide in the reference sequence. 
This interrogation position must be occupied by a N in one of 
the pooled probes, and a different pooled nucleotide in each 

15 of the other two pooled probes. 

In a seventh embodiment, the invention provides arrays 
for bridge tiling. Bridge tiling is a species of the general 
tiling strategies noted above, in which probes from the first 
probe set contain more than one segment of complementarity. 

20 In bridge tiling, a nucleotide in a reference sequence is 

usually determined from a comparison of four probes. A first 
probe comprises at least first and second segments, each of at 
least three nucleotides and each exactly complementary to 
first and second subsequences of a reference sequences. The 

25 segments including at least one interrogation position 

corresponding to a nucleotide in the reference sequence. 
Either (1) the first and second subsequences are noncontiguous 
in the reference sequence, or (2) the first and second 
subsequences are contiguous and the first and second segments 

30 are inverted relative to the first and second subsequences. 

The arrays further comprises second, third and fourth probes, 
which are identical to a sequence comprising the first probe 
or a subsequence thereof comprising at least three nucleotides 
from each of the first and second segments, except in the at 

35 least one interrogation position, which differs in each of the 
probes. In a species of bridge tiling, referred to as 
deletion tiling, the first and second subsequences are 
separated by one or two nucleotides in the reference sequence. 
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In an eighth embodiment, the invention provides arrays of 
probes for multiplex tiling. Multiplex tiling is a strategy, 
in which the identity of two nucleotides in a target sequence 
is determined from a comparison of the hybridization 
5 intensities of four probes, each having two interrogation 
positions. Each of the probes comprising a segment of at 
least 7 nucleotides that is exactly complementary to a 
subsequence from a reference sequence, except that the segment 
may or may not be exactly complementary at two interrogation 

10 positions ♦ The nucleotides occupying the interrogation 

positions are selected by the following rules: (1) the first 
interrogation position is occupied by a different nucleotide 
in each of the four probes, (2) the second interrogation 
position is occupied by a different nucleotide in each of the 

15 four probes, (3) in first and second probes, the segment is 
exactly complementary to the subsequence, except at no more 
than one of the interrogation positions, (4) in third and 
fourth probes, the segment is exactly complementary to the 
subsequence, except at both of the interrogation positions. 

20 In a ninth embodiment, the invention provides arrays of 

immobilized probes including helper mutations. Helper 
mutations are useful for, e.g., preventing self-annealing of 
probes having inverted repeats. In this strategy, the 
identity of a nucleotide in a target sequence is usually 

25 determined from a comparison of four probes. A first probe 
comprises a segment of at least 7 nucleotides exactly 
complementary to a subsequence of a reference sequence except 
at one or two positions, the segment including an 
interrogation position not at the one or two positions. The 

30 one or two positions are occupied by helper mutations. 

Second, third and fourth mutant probes are each identical to a 
sequence comprising the wildtype probe or a subsequence 
thereof including the interrogation position and the one or 
two positions, except in the interrogation position, which is 

35 occupied by a different nucleotide in each of the four probes. 

In a tenth embodiment, the invention provides arrays of 
probes comprising at least two probe sets, but lacking a probe 
set comprising probes that are perfectly matched to a 
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reference sequence. Such arrays are usually employed in 
methods in which both reference .and target sequence are 
hybridized to the array. The first probe set comprising a 
plurality of probes, each probe comprising a segment exactly 
5 complementary to a subsequence of at least 3 nucleotides of a 
reference sequence except at an interrogation position. The 
second probe set comprises a corresponding probe for each 
probe in the first probe set, the corresponding probe in the 
second probe set being identical to a sequence comprising the 

10 corresponding probe from the first probe set or a subsequence 
of at least three nucleotides thereof that includes the 
interrogation position, except that the interrogation position 
is occupied by a different nucleotide in each of the two 
corresponding probes and the complement to the reference 

15 sequence. 

In an eleventh embodiment, the. invention provides methods 
of comparing a target sequence with a reference sequence 
comprising a predetermined sequence of nucleotides using any 
of the arrays described above. The methods comprise 

20 hybridizing the target nucleic acid to an array and 

determining which probes, relative to one another, in the 
array bind specifically to the target nucleic acid. The 
relative specific binding of the probes indicates whether the 
target sequence is the same or different from the reference 

25 sequence. In some such methods, the target sequence has a 

substituted nucleotide relative to the reference sequence in 
at least one undetermined position, and the relative specific 
binding of the probes indicates the location of the position 
and the nucleotide occupying the position in the target 

30 sequence. In some methods, a second target nucleic acid is 
also hybridized to the array. The relative specific binding 
of the probes then indicates both whether the target sequence 
is the same or different from the reference sequence, and 
whether the second target sequence is the same or different 

35 from the reference sequence. In some methods, when the array 
comprises two groups of probes tiled for first and second 
reference sequences, respectively, the relative specific 
binding of probes in the first group indicates whether the 
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target sequence is the same or different from the first 
reference sequence. The relative specific binding of probes 
in the second group indicates whether the target sequence is 
the same or different from the second reference sequence. 
5 Such methods are particularly useful for analyzing 
heterologous alleles of a gene. Some methods entail 
hybridizing both a reference sequence and a target sequence to 
any of the arrays of probes described above. Comparison of 
the relative specific binding of the probes to the reference 

10 and target sequences indicates whether the target sequence is 
the same or different from the reference sequence. 

In a twelfth embodiment, the invention provides arrays of 
immobilized probes in which the probes are designed to tile a 
reference sequence from a human immunodeficiency virus. 

15 Reference sequences from either the reverse transcriptase gene 
or protease gene of HIV are of particular interest. Some 
chips further comprise arrays of probes tiling a reference 
sequence from a 16S RNA or DNA encoding the 16S RNA from a 
pathogenic microorganism. The invention further provides 

20 methods of using such arrays in analyzing a HIV target 

sequence. The methods are particularly useful where the 
target sequence has a substituted nucleotide relative to the 
reference sequence in at least one position, the substitution 
conferring resistance to a drug use in treating a patient 

25 infected with a HIV virus. The methods reveal the existence 
of the substituted nucleotide. The methods are also 
particularly useful for analyzing a mixture of undetermined 
proportions of first and second target sequences from 
different HIV variants. The relative specific binding of 

30 ptobes indicates the proportions of the first and second 
target sequences. 

In a thirteenth embodiment, the invention provides arrays 
of probes tiled based on reference sequence from a CFTR gene. 
A preferred array comprises at least a group of probes 

35 comprising a wildtype probe, and five sets of three mutant 
probes. The wildtype probe is exactly complementary to a 
subsequence of a reference sequence from a cystic fibrosis 
gene, the segment having at least five interrogation positions 
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corresponding to five contiguous nucleotides in the reference 
sequence. The probes in the first set of three mutant probes 
are each identical to the wildtype probe, except in a first of 
the five interrogation positions, which is occupied by a 
different nucleotide in each of the three mutant probes and 
the wildtype probe. The probes in the second set of three 
mutant probes are each identical to the wildtype probe, except 
in a second of the five interrogation positions, which is 
occupied by a different nucleotide in each of the three mutant 
probes and the wildtype probe. The probes in the third set of 
three mutant probes are each identical to the wildtype probe, 
except in a third of the five interrogation positions, which 
is occupied by a different nucleotide in each of the three 
mutant probes and the wildtype probe. The probes in the 
fourth set of three mutant probes are each identical to the 
wildtype probe, except in a fourth of the five interrogation 
positions, which is occupied by a different nucleotide in each 
of the three mutant probes and the wildtype probe. The probes 
in the fifth set of three mutant probes are each identical to 
the wildtype probe, except in a fifth of the five 
interrogation positions, which is occupied by a different 
nucleotide in each of the three mutant probes and the wildtype 
probe. Preferably, a chip comprises two such groups of 
probes. The first group comprises a wildtype probe exactly 
complementary to a first reference sequence, and the second 
group comprises a wildtype probe exactly complementary to a 
second reference sequence that is a mutated form of the first 
reference sequence. 

The invention further provides methods of using the 
arrays of the invention for analyzing target sequences from a 
CFTR gene. The methods are capable of simultaneously 
analyzing first and second target sequences representing 
heterozygous alleles of a CFTR gene. 

In a fourteenth embodiment, the invention provides arrays 
of probes tiling a reference sequence from a p53 gene, an 
hMLHl gene and/ or an MSH2 gene. The invention further 
provides methods of using the arrays described above to 
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analyze these genes. The method are useful, e.g., for 
diagnosing patients susceptible -to developing cancer. 

In a fifteenth embodiment, the invention provides arrays 
of probes tiling a reference sequence from a mitochondrial 
5 genome. The reference sequence may comprise part or all of 
the D-loop region, or all, or substantially all, of the 
mitochondrial genome. The invention further provides method 
of using the arrays described above to analyze target 
sequences from a mitochondrial genome. The methods are useful 
10 for identifying mutations associated with disease, and for 
forensic , epidemiological and evolutionary studies. 

BRIEF DESCRIPTION OF THE FIGURES 
Fig. 1: Basic tiling strategy. The figure illustrates 
15 the relationship between an interrogation position (I) and a 
corresponding nucleotide (n) in the reference sequence, and 
between a probe from the first probe set and corresponding 
probes from second, third and fourth probe sets. 

Fig. 2: Segment of complementarity in a probe from the 
20 first probe set. 

Fig. 3: Incremental succession of probes in a basic 
tiling strategy. The figure shows four probe sets, each 
having three probes. Note that each probe differs from its 
predecessor in the same set by the acquisition of a 5 1 
25 nucleotide and the loss of a 3 1 nucleotide, as well as in the 
nucleotide occupying the interrogation position. 

Fig. 4: Exemplary arrangement of lanes on a chip. The 
chip shows four probe sets, each having five probes and each 
having a total of five interrogation positions (11-15) , one 
30 per probe. 

Fig. 5: Hybridization pattern of chip having probes laid 
down in lanes. Dark patches indicate hybridization. The 
probes in the lower part of the figure occur at the column of 
the array indicated by the arrow when the probes length is 15 
35 and the interrogation position 7. 

Fig. 6: Strategies for detecting deletion and insertion 
mutations. Bases in brackets may or may not be present. 
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Fig. 7: Block tiling strategy. The probe from the first 
probe set has three interrogation positions. The probes from 
the other probe sets have only one of these interrogation 
positions. 

5 Fig. 8: Multiplex tiling strategy. Each probe has two 

interrogation positions. 

Fig. 9. Helper mutation strategy. The segment of * 
complementarity differs from the complement of the reference 
sequence at a helper mutation as well as the interrogation 
10 position* 

Fig. 10 Layout of probes on the HV 407 chip. The figure 
shows successive rows of sequence each of which is subdivided 
into four lanes. The four lanes correspond to the A-, C-, G- 
and T-lanes on the chip. Each probe is represented by the 
15 nucleotide occupying its interrogation position. The letter 

M N" indicates a control probe or empty column. The different 
sized-probes are laid out in parallel. That is, from top- to- 
bottom, a row of 13 mers is followed by a row of 15 mers, 
which is followed by a row of 17 mers, which is followed by a 
20 row of 19 mers. 

Fig. 11 Fluorescence pattern of HV 407 hybridized to a 
target sequence (pPo!19) identical to the chips reference 
sequence . 

Fig. 12 Sequence read from HV 407 chip hybridized to 
25 pPoll9 and 4MUT18 (separate experiments) . The reference 
sequence is designated "wildtype. Beneath the reference 
sequence are four rows of sequence read from the chip 
hybridized to the pPoll9 target, the first row being read from 
13 mers, the second row from 15 mers, the third row from 17 
30 mers and the fourth row from 19 mers. Beneath these 

sequences, there are four further rows of sequence read from 
the chip hybridized to the HXB2 target. Successive rows are 
read from 13 mers, 15 mers, 17 mers and 19 mers. Each 
nucleotide in a row is called from the relative fluorescence 
35 intensities of probes in G- and T-lanes. Regions of * 

ambiguous sequence read from the chip are highlighted. The 
strain differences between the HBX2 sequence and the reference 
sequence that were correctly detected are indicated (*) , and 
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those that could not be called are indicated (o) . (The 
nucleotide at position 417 was read correctly in some 
experiments) . The location of some mutations known to be 
associated with drug resistance that occur in readable regions 
5 of the chip are shown above (codon number) and below (mutant 
nucleotide) the sequence designated "wildtype." The locations 
w of primer used to amplify the target sequence are indicated by 

arrows . 

Fig. 13: Detection of mixed target sequences. The 
10 mutant target differs from the wildtype by a single mutation 

in codon 67 of the reverse transcriptase gene. Each different 
sized group of probes has a column of four probes for reading 
the nucleotide in which the mutation occurs. The four probes 
occupying a column are represented by a single probe in the 
15 figure with the symbol (o) indicating the interrogation 

position, which is occupied by a different nucleotide in each 
probe . 

Fig. 14: Fluorescence intensities of target bound to 13 
mers and 15 mers for different proportions of mutant and 

20 wildtype target. The fluorescence intensities are from probes 
having interrogation positions for reading the nucleotide at 
which the mutant and wildtype targets diverge. 

Fig. 15: Sequence read from protease chip from four 
clinical samples before and after treatment with ddl>. 

25 Fig. 16: Block tiling array of probes for analyzing a 

CFTR point mutation. Each probe show actually represents four 
probes, with one probe having each of A, C, G or T at the 
interrogation position N. In the order shown, the first probe 
shown on the left is tiled from the wildtype reference 

30 sequence, the second probe from the mutant sequence, and so on 
in alternating fashion. Note that all of the probes are 
identical except at the interrogation position, which shifts 
4 one position between successive probes tiled from the same 

reference sequence (e.g., the first, third and fifth probes in 
4 35 the left hand column.) The grid shows the hybridization 
intensities when the array is hybridized to the reference 
sequence. 
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Fig. 17: Hybridization pattern for heterozygous target. 
The figure shows the hybridization pattern when the array of 
the previous figure is hybridized to a mixture of mutant and 
wildtype reference sequences. 

Fig. 18, in panels A, B, and C, shows an image made from 
the region of a DNA chip containing CFTR exon 10 probes; in 
panel A, the chip was hybridized to a wild-type target; in 
panel C, the chip was hybridized to a mutant AF508 target; and 
in panel B, the chip was hybridized to a mixture of the 
wild-type and mutant targets. 

Fig. 19, in sheets 1-3, corresponding to panels A, B, 
and C of Fig. 18, shows graphs of fluorescence intensity 
versus tiling position. The labels on the horizontal axis 
show the bases in the wild-type sequence corresponding to the 
position of substitution in the respective probes. Plotted 
are the intensities observed from the features (or synthesis 
sites) containing wild-type probes, the features containing 
the substitution probes that bound the most target ("called") , 
and the feature containing the substitution probes that bound 
the target with the second highest intensity of all the 
substitution probes ("2nd Highest"). 

Fig. 20, in panels A, B, and C, shows an image made from 
a region of a DNA chip containing CFTR exon 10 probes; in 
panel A, the chip was hybridized to the wt480 target; in panel 
C, the chip was hybridized to the mu4 80 target; and in panel 
B, the chip was hybridized to a mixture of the wild-type and 
mutant targets. 

Fig. 21, in sheets 1-3, corresponding to panels A, B, 
and C of Fig. 20, shows graphs of fluorescence intensity 
versus tiling position. The labels on the horizontal axis 
show the bases in the wild-type sequence corresponding to the 
position of substitution in the respective probes. Plotted 
are the intensities observed from the features (or synthesis 
sites) containing wild-type probes, the features containing 
the substitution probes that bound the most target ("called"), 
and the feature containing the substitution probes that bound 
the target with the second highest intensity of all the 
substitution probes ("2nd Highest"). 
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Fig. 22, in panels A and B, shows an image made from a 
region of a DNA chip containing CFTR exon 10 probes; in panel 
A, the chip was hybridized to nucleic acid derived from the 
genomic DNA of an individual with wild-type AF508 sequences; 
5 in panel B, the target nucleic acid originated from a 

heterozygous (with respect to the AF508 mutation) individual. 

Fig. 23, in sheets 1 and 2, corresponding to panels A and 
B of Fig. 22, shows graphs of fluorescence intensity versus 
tiling position. The labels on the horizontal axis show the 
10 bases in the wild-type sequence corresponding to the position 
of substitution in the respective probes. Plotted are the 
intensities observed from the features (or synthesis sites) 
containing wild-type probes, the features containing the 
substitution probes that bound the most target ("called") , and 
15 the feature containing the substitution probes that bound the 
target with the second highest intensity of all the 
substitution probes ("2nd Highest"). 

Fig. 24: Hybridization of homozygous wildtype (A) and 
heterozygous (B) target sequences from exon 11 of the CFTR 
20 gene to a block tiling array designed to detect G551D and 
Q552X mutations in CFTR gene. 

Fig. 25: Hybridization of homozygous wildtype (A) and 
AF508 mutant (B) target sequences from exon 10 of the CFTR 
gene to a block tiling array designed to detect mutations, 
25 AF508, AI507 and F508C. 

Fig. 26: Hybridization of heterozygous mutant target 
sequences, AF508/F508C, to the array of Fig. 25. 

Fig. 27 shows the alignment of some of the probes on a 
p53 DNA chip with a 12-mer model target nucleic acid. 
3 0 Fig. 2 8 shows a set of 10-mer probes for a p53 exon 6 DNA 

chip. 

Fig. 2 9 shows that very distinct patterns are observed 
after hybridization of p53 DNA chips with targets having 
different l base substitutions. In the first image in Fig. 
35 29, the 12-mer probes that form perfect matches with the 
wild-type target are in the first row (top) . The 12-mer 
probes with single base mismatches are located in the second, 
third, and fourth rows and have much lower signals. 
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Fig. 30, in graphs 2, 3, and 4, graphically depicts the 
data in Fig. 29. On each graph/ the X ordinate is the 
position of the probe in its row on the chip, and the Y 
ordinate is the signal at that probe site after hybridization. 
5 Fig. 31 shows the results of hybridizing mixed target 

populations of WT and mutant p53 genes to the p53 DNA chip. 

Fig. 32, in graphs 1-4, shows (see Fig. 30 as well) the 
hybridization efficiency of a 10-mer probe array as compared 
to a 12-mer probe array. 
10 Fig. 3 3 shows an image of a p53 DNA chip hybridized to a 

target DNA. 

Fig. 34 illustrates how the actual sequence was read from 
the chip shown in Fig. 33. Gaps in the sequence of letters in 
the WT rows correspond to control probes or sites. Positions 
15 at which bases are miscalled are represented by letters in 

italic type in cells corresponding to probes in which the WT 
bases have been substituted by other bases. 

Fig. 35 shows the human mitochondrial genome; n O H " is the 
H strand origin of replication, and arrows indicate the cloned 
20 unshaded sequence. 

Fig. 36 shows the image observed from application of a 
sample of mitochondrial DNA derived nucleic acid (from the mt4 
sample) on a DNA chip. 

Fig. 37 is similar to Fig. 36 but shows the image 
25 observed from the mt5 sample. 

Fig. 38 shows the predicted difference image between the 
mt4 and mt5 samples on the DNA chip based on mismatches 
between the two samples and the reference sequence. 

Fig. 39 shows the actual difference image observed for 
30 the mt4 and mt5 samples. 

Fig. 40, in sheets 1 and 2, shows a plot of normalized 
intensities across rows 10 and 11 of the array and a 
tabulation of the mutations detected. 

Fig. 41 shows the discrimination between wild-type and 
35 mutant hybrids obtained with the chip. A median of the six 

normalized hybridization scores for each probe was taken; the 
graph plots the ratio of the median score to the normalized 
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hybridization score versus mean counts. A ratio of 1.6 and 
mean counts above 50 yield no false positives. 

Fig. 42 illustrates how the identity of the base mismatch 
may influence the ability to discriminate mutant and wild-type 
5 sequences more than the position of the mismatch within an 

oligonucleotide probe. The mismatch position is expressed as 
% of probe length from the 3' -end. The base change is 
indicated on the graph. 

Fig. 43 provides a 5 1 to 3 1 sequence listing of one 
10 target corresponding to the probes on the chip. X is a 

control probe. Positions that differ in the target (i*e., are 
mismatched with the probe at the designated site) are in bold. 

Fig. 44 shows the fluorescence image produced by scanning 
the chip described in Fig. 17 when hybridized to a sample. 
15 Fig. 45 illustrates the detection of 4 transitions in the 

target sequence relative to the wild-type probes on the chip 
in Fig. 44. 

Fig. 46: VLSIPS™ technology applied to the light 
directed synthesis of oligonucleotides. Light (hv) is shone 

20 through a mask to activate functional groups (-0H) on a 

surface by removal of a protecting group (X) . Nucleoside 
building blocks protected with photoremovable protecting 
groups (T-X, C-X) are coupled to the activated areas. By 
repeating the irradiation and coupling steps, very complex 

25 arrays of oligonucleotides can be prepared. 

Fig. 47: Use of the VLSIPS™ process to prepare 
'•nucleoside combinatorials" or oligonucleotides synthesized by 
coupling all four nucleosides to form dimers, trimers, and so 
forth. 

30 Fig. 48: Deprotection, coupling, and oxidation steps of 

a solid phase DNA synthesis method. 

Fig. 49: An illustrative synthesis route for the 
nucleoside building blocks used in the VLSIPS™ method. 

Fig. 50: A preferred photoremovable protecting group, 
35 MeNPOC, and preparation of the group in active form. 

Fig. 51: Detection system for scanning a DNA chip. 
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DETAILED DESCRIPTION OF THE INVENTION 
The invention provides a number of strategies for 
comparing a polynucleotide of known sequence (a reference 
sequence) with variants of that sequence (target sequences) . 
5 The comparison can be performed at the level of entire 

genomes, chromosomes, genes, exons or introns, or can focus on 
individual mutant sites and immediately adjacent bases. The 
strategies allow detection of variations, such as mutations or 
polymorphisms, in the target sequence irrespective whether a 
10 particular variant has previously been characterized. The 
strategies both define the nature of a variant and identify 
its location in a target sequence. 

The strategies employ arrays of oligonucleotide probes 
immobilized to a solid support. Target sequences are analyzed 
15 by determining the extent of hybridization at particular 
probes in the array. The strategy in selection of probes 
facilitates distinction between perfectly matched probes and 
probes showing single-base or other degrees of mismatches. 
The strategy usually entails sampling each nucleotide of 
20 interest in a target sequence several times, thereby achieving 
a high degree of confidence in its identity. This level of 
confidence is further increased by sampling of adjacent 
nucleotides in the target sequence to nucleotides of interest. 
The number of probes on the chip can be quite large (e.g., 
25 10 5 -10 6 ) . However, usually only a small proportion of the 
total number of probes of a given length are represented. 
Some advantage of the use of only a small proportion of all 
possible probes of a given length include: (i) each position 
in the array is highly informative, whether or not 
30 hybridization occurs; (ii) nonspecific hybridization is 
minimized; (iii) it is straightforward to correlate 
hybridization differences with sequence differences, 
particularly with reference to the hybridization pattern of a 
known standard; and (iv) the ability to address each probe 
35 independently during synthesis, using high resolution 
photolithography, allows the array to be designed and 
optimized for any sequence. For example the length of any 
probe can be varied independently of the others. 
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The present tiling strategies result in sequencing and 
comparison methods suitable for routine large-scale practice 
with a high degree of confidence in the sequence output. 

5 I. GENERAL TILING STRATEGIES 

A. Selection of Reference Sequence 
The chips are designed to contain probes exhibiting 
complementarity to one or more selected reference sequence 
whose sequence is known. The chips are used to read a target 

10 sequence comprising either the reference sequence itself or 
variants of that sequence. Target sequences may differ from 
the reference sequence at one or more positions but show a 
high overall degree of sequence identity with the reference 
sequence (e.g., at least 75, 90, 95, 99, 99.9 or 99.99%). Any 

15 polynucleotide of known sequence can be selected as a 

reference sequence. Reference sequences of interest include 
sequences known to include mutations or polymorphisms 
associated with phenotypic changes having clinical 
significance in human patients. For example, the CFTR gene 

20 and P53 gene in humans have been identified as the location of 
several mutations resulting in cystic fibrosis or cancer 
respectively. Other reference sequences of interest include 
those that serve to identify pathogenic microorganisms and/ or 
are the site of mutations by which such microorganisms acquire 

25 drug resistance (e.g., the HIV reverse transcriptase gene). 
Other reference sequences of interest include regions where 
polymorphic variations are known to occur (e.g., the D-loop 
region of mitochondrial DNA) . These reference sequences have 
utility for, e.g., forensic or epidemiological studies. Other 

30 reference sequences of interest include p34 (related to p53) , 
p65 (implicated in breast, prostate and liver cancer), and DNA 
segments encoding cytochromes P450 (see Meyer et al., Pharmac . 
Ther. 46, 349-355 (1990)). Other reference sequences of 
interest include those from the genome of pathogenic viruses 

35 (e.g., hepatitis (A, B, or C) , herpes virus (e.g., VZV, HSV-1, 
HAV-6, HSV-II, and CMV, Epstein Barr virus), adenovirus, 
influenza virus, f laviviruses, echovirus, rhinovirus, 
coxsackie virus, cornovirus, respiratory syncytial virus, 
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mumps virus, rotavirus, measles virus, rubella virus, 
parvovirus, vaccinia virus, HTLV virus, dengue virus, 
papillomavirus, molluscum virus, poliovirus, rabies virus, JC 
virus and arboviral encephalitis virus, other reference 
5 sequences of interest are from genomes or episomes of 

pathogenic bacteria, particularly regions that confer drug 
resistance or allow phylogenic characterization of the host 
(e.g., 16S rRNA or corresponding DNA) . For example, such 
bacteria include chlamydia, rickettsial bacteria, 

10 mycobacteria, staphylococci, treptocci, pneumonococci, 

meningococci and conococci, klebsiella, proteus, serratia, 
pseudomonas, legionella, diphtheria, salmonella, bacilli, 
cholera, tetanus, botulism, anthrax, plague, leptospirosis, 
and Lymes disease bacteria. Other reference sequences of 

15 interest include those in which mutations result in the 

following autosomal recessive disorders: sickle cell anemia, 
^-thalassemia, phenylketonuria, galactosemia, Wilson's 
disease, hemochromatosis, severe combined immunodeficiency, 
alpha-l-antitrypsin deficiency, albinism, alkaptonuria, 

20 lysosomal storage diseases and Ehlers-Danlos syndrome. Other 
reference sequences of interest include those in which 
mutations result in X-linked recessive disorders: hemophilia, 
glucose-6-phosphate dehydrogenase , agammaglobulimenia , 
diabetes insipidus, Lesch-Nyhan syndrome, muscular dystrophy, 

25 Wiskott-Aldrich syndrome, Fabry's disease and fragile X- 
syndrome. Other reference sequences of interest includes 
those in which mutations result in the following autosomal 
dominant disorders: familial hypercholesterolemia, polycystic 
kidney disease, Huntingdon's disease, hereditary 

30 spherocytosis, Marfan's syndrome, von Willebrand's disease, 

neurofibromatosis, tuberous sclerosis, hereditary hemorrhagic 
telangiectasia, familial colonic polyposis, Ehlers-Danlos 
syndrome, myotonic dystrophy, muscular dystrophy, osteogenesis 
imperfecta, acute intermittent porphyria, and von Hippel- 

35 Lindau disease. 

The length of a reference sequence can vary widely from a 
full-length genome, to an individual chromosome, episome, 
gene, component of a gene, such as an exon, intron or 



WO 95/11995 



21 



PCT/US94/12305 



regulatory sequences, to a few nucleotides. A reference 
sequence of between about 2, 5, -10, 20, 50, 100, 5000, 1000, 
5,000 or 10,000, 20,000 or 100,000 nucleotides is common. 
Sometimes only particular regions of a sequence (e.g., exons 
5 of a gene) are of interest. In such situations, the 

particular regions can be considered as separate reference 
1 sequences or can be considered as components of a single 

reference sequence, as matter of arbitrary choice. 

A reference sequence can be any naturally occurring, 

10 mutant, consensus or purely hypothetical sequence of 

nucleotides, RNA or DNA. For example, sequences can be 
obtained from computer data bases, publications or can be 
determined or conceived de novo. Usually, a reference 
sequence is selected to show a high degree of sequence 

15 identity to envisaged target sequences. Often, particularly, 
where a significant degree of divergence is anticipated 
between target sequences, more than one reference sequence is 
selected. Combinations of wildtype and mutant reference 
sequences are employed in several applications of the tiling 

2 0 strategy . 

B. Chip Design 

1. Basic Tiling Strategy 

The basic tiling strategy provides an array of 
25 immobilized probes for analysis of target sequences showing a 
high degree of sequence identity to one or more selected 
reference sequences. The strategy is first illustrated for an 
array that is subdivided into four probe sets, although it 
will be apparent that in some situations, satisfactory results 

3 0 are obtained from only two probe sets. A first probe set 

comprises a plurality of probes exhibiting perfect 
complementarity with a selected reference sequence. The 
perfect complementarity usually exists throughout the length 
of the probe. However, probes having a segment or segments of 
35 perfect complementarity that is/ are flanked by leading or 

trailing sequences lacking complementarity to the reference 
sequence can also be used. Within a segment of 
complementarity, each probe in the first probe set has at 
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least one interrogation position that corresponds to a 
nucleotide in the reference sequence. That is, the 
interrogation position is aligned with the corresponding 
nucleotide in the reference sequence, when the probe and 
reference sequence are aligned to maximize complementarity 
between the two. If a probe has more than one interrogation 
position, each corresponds with a respective nucleotide in the 
reference sequence. The identity of an interrogation position 
and corresponding nucleotide in a particular probe in the 
first probe set cannot be determined simply by inspection of 
the probe in the first set. As will become apparent, an 
interrogation position and corresponding nucleotide is defined 
by the comparative structures of probes in the first probe set 
and corresponding probes from additional probe sets. 

In principle, a probe could have an interrogation 
position at each position in the segment complementary to the 
reference sequence. Sometimes, interrogation positions 
provide more accurate data when located away from the ends of 
a segment of complementarity. Thus, typically a probe having 
a segment of complementarity of length x does not contain more 
than x-2 interrogation positions. Since probes are typically 
9-21 nucleotides, and usually all of a probe is complementary, 
a probe typically has 1-19 interrogation positions. Often the 
probes contain a single interrogation position, at or near the 
center of probe. 

For each probe in the first set, there are, for purposes 
of the present illustration, three corresponding probes from 
three additional probe sets. See Fig. 1. Thus, there are 
four probes corresponding to each nucleotide of interest in 
the reference sequence. Each of the four corresponding probes 
has an interrogation position aligned with that nucleotide of 
interest. Usually, the probes from the three additional 
probe sets are identical to the corresponding probe from the 
first probe set with one exception. The exception is that at 
least one (and often only one) interrogation position, which 
occurs in the same position in each of the four corresponding 
probes from the four probe sets, is occupied by a different 
nucleotide in the four probe sets. For example, for an A 
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nucleotide in the reference sequence, the corresponding probe 
from the first probe set has its- interrogation position 
occupied by a T, and the corresponding probes from the 
additional three probe sets have their respective 
interrogation positions occupied by A, C f or G, a different 
nucleotide in each probe. Of course, if a probe from the 
first probe set comprises trailing or flanking sequences 
lacking complementarity to the reference sequences (see 
Fig. 2) , these sequences need not be present in corresponding 
probes from the three additional sets. Likewise corresponding 
probes from the three additional sets can contain leading or 
trailing sequences outside the segment of complementarity that 
are not present in the corresponding probe from the first 
probe set. Occasionally, the probes from the additional three 
probe set are identical (with the exception of interrogation 
position (s) ) to a contiguous subsequence of the full 
complementary segment of the corresponding probe from the 
first probe set. In this case, the subsequence includes the 
interrogation position and usually differs from the full- 
length probe only in the omission of one or both terminal 
nucleotides from the termini of a segment of complementarity. 
That is, if a probe from the first probe set has a segment of 
complementarity of length n, corresponding probes from the 
other sets will usually include a subsequence of the segment 
of at least length n-2. Thus, the subsequence is usually at 
least 3, 4, 7, 9, 15, 21, or 25 nucleotides long, most 
typically, in the range of 9-21 nucleotides. The subsequence 
should be sufficiently long to allow a probe to hybridize 
detectably more strongly to a variant of the reference 
sequence mutated at the interrogation position than to the 
reference sequence. 

The probes can be oligodeoxyribonucleotides or 
oligoribonucleotides, or any modified forms of these polymers 
that are capable of hybridizing with a target nucleic sequence 
by complementary base-pairing. Complementary base pairing 
means sequence-specific base pairing which includes e.g., 
Watson-Crick base pairing as well as other forms of base 
pairing such as Hoogsteen base pairing. Modified forms 
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include Z^O-methyl oligoribonucleotides and so-called PNAs, 
in which oligodeoxyribonucleotides are linked via peptide 
bonds rather than phophodiester bonds. The probes can be 
attached by any linkage to a support (e.g., 3', 5' or via the 
base). 3' attachment is more usual as this orientation is 
compatible with the preferred chemistry for solid phase 
synthesis of oligonucleotides. 

The number of probes in the first probe set (and as a 
consequence the number of probes in additional probe sets) 
depends on the length of the reference sequence, the number of 
nucleotides of interest in the reference sequence and the 
number of interrogation positions per probe. In general, each 
nucleotide of interest in the reference sequence requires the 
same interrogation position in the four sets of probes. 
Consider, as an example, a reference sequence of 100 
nucleotides, 50 of which are of interest, and probes each 
having a single interrogation position. In this situation, 
the first probe set requires fifty probes, each having one 
interrogation position corresponding to a nucleotide of 
interest in the reference sequence. The second, third and 
fourth probe sets each have a corresponding probe for each 
probe in the first probe set, and so each also contains a 
total of fifty probes. The identity of each nucleotide of 
interest in the reference sequence is determined by comparing 
the relative hybridization signals at four probes having 
interrogation positions corresponding to that nucleotide from 
the four probe sets. 

In some reference sequences, every nucleotide is of 
interest. In other reference sequences, only certain portions 
in which variants (e.g., mutations or polymorphisms) are 
concentrated are of interest. In other reference sequences, 
only particular mutations or polymorphisms and immediately 
adjacent nucleotides are of interest. Usually, the first 
probe set has interrogation positions selected to correspond 
to at least a nucleotide (e.g., representing a point mutation) 
and one immediately adjacent nucleotide. Usually, the probes 
in the first set have interrogation positions corresponding to 
at least 3, 10, 50, 100, 1000, or 20,000 contiguous 
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nucleotides* The probes usually have interrogation positions 
corresponding to at least 5, 10,. 30, 50 , 75, 90, 99 or 
sometimes 100% of the nucleotides in a reference sequence. 
Frequently, the probes in the first probe set completely span 
5 the reference sequence and overlap with one another relative 
to the reference sequence. For example, in one common 
* arrangement each probe in the first probe set differs from 

another probe in that set by the omission of a 3 1 base 
complementary to the reference sequence and the acquisition of 
10 a 5' base complementary to the reference sequence • See 
Fig. 3. 

For conceptual simplicity, the probes in a set are 
usually arranged in order of the sequence in a lane across the 
chip. A lane contains a series of overlapping probes, which 

15 represent or tile across, the selected reference sequence (see 
Fig. 3) . The components of the four sets of probes are 
usually laid down in four parallel lanes, collectively 
constituting a row in the horizontal direction and a series of 
4-member columns in the vertical direction. Corresponding 

20 probes from the four probe sets (i.e., complementary to the 
same subsequence of the reference sequence) occupy a column. 
Each probe in a lane usually differs from its predecessor in 
the lane by the omission of a base at one end and the 
inclusion of additional base at the other end as shown in 

25 Fig. 3. However, this orderly progression of probes can be 

interrupted by the inclusion of control probes or omission of 
probes in certain columns of the array. Such columns serve as 
controls to orient the chip, or gauge the background, which 
can include target sequence nonspecif ically bound to the chip. 

30 The probes sets are usually laid down in lanes such that 

all probes having an interrogation position occupied by an A 
form an-A-lane, all probes having an interrogation position 
occupied by a C form a C-lane, all probes having an 
interrogation position occupied by a G form a G-lane, and all 
1 35 probes having an interrogation position occupied by a T (or U) 
form a T lane (or a U lane) . Note that in this arrangement 
there is not a unique correspondence between probe sets and 
lanes. Thus, the probe from the first probe set is laid down 
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in the A-lane, C-lane, A-lane, A-lane and T-lane for the five 
columns in Fig. 4. The interrogation position on a column of 
probes corresponds to the position in the target sequence 
whose identity is determined from analysis of hybridization to 
the probes in that column. Thus, I3.-I5 respectively 
correspond to N^^-Ng in Fig. 4. The interrogation position can 
be anywhere in a probe but is usually at or near the central 
position of the probe to maximize differential hybridization 
signals between a perfect match and a single-base mismatch. 
For example, for an 11 mer probe, the central position is the 
sixth nucleotide. 

Although the array of probes is usually laid down in rows 
and columns as described above, such a physical arrangement of 
probes on the chip is not essential. Provided that the 
spatial location of each probe in an array is known, the data 
from the probes can be collected aijd processed to yield the 
sequence of a target irrespective of the physical arrangement 
of the probes on a chip. In processing the data, the 
hybridization signals from the respective probes can be 
reassorted into any conceptual array desired for subsequent 
data reduction whatever the physical arrangement of probes on 
the chip. 

A range of lengths of probes can be employed in the 
chips. As noted above, a probe may consist exclusively of a 
complementary segments, or may have one or more complementary 
segments juxtaposed by flanking, trailing and/or intervening 
segments. In the latter situation, the total length of 
complementary segment (s) is more important that the length of 
the probe. In functional terms, the complementarity 
segment (s) of the first probe sets should be sufficiently long 
to allow the probe to hybridize detectably more strongly to a 
reference sequence compared with a variant of the reference 
including a single base mutation at the nucleotide 
corresponding to the interrogation position of the probe. 
Similarly, the complementarity segment (s) in corresponding 
probes from additional probe sets should be sufficiently long 
to allow a probe to hybridize detectably more strongly to a 
variant of the reference sequence having a single nucleotide 
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substitution at the interrogation position relative to the 
reference sequence. A probe usually has a single 
complementary segment having a length of at least 
3 nucleotides, and more usually at least 5, 6, 7, 8, 9, 10, 
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 
30 bases exhibiting perfect complementarity (other than 
possibly at the interrogation position (s) depending on the 
probe set) to the reference sequence. In bridging strategies, 
where more than one segment of complementarity is present, 
each segment provides at least three complementary nucleotides 
to the reference sequence and the combined segments provide at 
least two segments of three or a total of six complementary 
nucleotides. As in the other strategies, the combined length 
of complementary segments is typically from 6-30 nucleotides, 
and preferably from about 9-21 nucleotides. The two segments 
are often approximately the same length. Often, the probes 
(or segment of complementarity within probes) have an odd 
number of bases, so that an interrogation position can occur 
in the exact center of the probe. 

In some chips, all probes are the same length. Other 
chips employ different groups of probe sets, in which case the 
probes are of the same size within a group, but differ between 
different groups. For example, some chips have one group 
comprising four sets of probes as described above in which all 
the probes are 11 mers, together with a second group 
comprising four sets of probes in which all of the probes are 
13 mers. Of course, additional groups of probes can be added. 
Thus, some chips contain, e.g., four groups of probes having 
sizes of 11 mers, 13 mers, 15 mers and 17 mers. Other chips 
have different size probes within the same group of four probe 
sets. In these chips, the probes in the first set can vary in 
length independently of each other. Probes in the other sets 
are usually the same length as the probe occupying the same 
column from the first set. However, occasionally different 
lengths of probes can be included at the same column position 
in the four lanes. The different length probes are included 
to equalize hybridization signals from probes irrespective of 
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whether A-T or C-G bonds are formed at the interrogation 
position. 

The length of probe can be important in distinguishing 
between a perfectly matched probe and probes showing a single- 
5 base mismatch with the target sequence. The discrimination is 
usually greater for short probes. Shorter probes are usually 
also less susceptible to formation of secondary structures. $ 
However, the absolute amount of target sequence bound, and 
hence the signal, is greater for larger probes. The probe 

10 length representing the optimum compromise between these 

competing considerations may vary depending on inter alia the 
GC content of a particular region of the target DNA sequence, 
secondary structure, synthesis efficiency and cross- 
hybridization. In some regions of the target, depending on 

15 hybridization conditions, short probes (e.g., 11 mers) may 
provide information that is inaccessible from longer probes 
(e.g., 19 mers) and vice versa. Maximum sequence information 
can be read by including several groups of different sized 
probes on the chip as noted above. However, for many regions 

20 of the target sequence, such a strategy provides redundant 

information in that the same sequence is read multiple times 
from the different groups of probes. Equivalent information 
can be obtained from a single group of different sized probes 
in which the sizes are selected to maximize readable sequence 

25 at particular regions of the target sequence. The appropriate 
size of probes at different regions of the target sequence can 
be determined from, e.g., Fig. 12, which compares the 
readability of different sized probes in different regions of 
a target. The strategy of customizing probe length within a t 

30 single group of probe sets minimizes the total number of 

probes required to read a particular target sequence. This 
leaves ample capacity for the chip to include probes to other 
reference sequences. * 
The invention provides an optimization block which allows 

35 systematic variation of probe length and interrogation 

position to optimize the selection of probes for analyzing a 
particular nucleotide in a reference sequence. The block 
comprises alternating columns of probes complementary to the 
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vildtype target and probes complementary to a specific 

mutation. The interrogation position is varied between 
. columns and probe length is varied down a column. 

Hybridization of the chip to the reference sequence or the 
5 mutant form of the reference sequence identifies the probe 

length and interrogation position providing the greatest 

differential hybridization signal. 

The probes are designed to be complementary to either 

strand of the reference sequence (e.g., coding or non-coding). 
10 Some chips contain separate groups of probes, one 

complementary to the coding strand, the other complementary to 

the noncoding strand. Independent analysis of coding and 

noncoding strands provides largely redundant information. 

However, the regions of ambiguity in reading the coding strand 
15 are not always the same as those in reading the noncoding 

strand. Thus, combination of the information from coding and 

noncoding strands increases the overall accuracy of 

sequencing. 

Some chips contain additional probes or groups of probes 

20 designed to be complementary to a second reference sequence. 
The second reference sequence is often a subsequence of the 
first reference sequence bearing one or more commonly 
occurring mutations or interstrain variations. The second 
group of probes is designed by the same principles as 

25 described above except that the probes exhibit complementarity 
to the second reference sequence. The inclusion of a second 
group is particular useful for analyzing short subsequences of 
the primary reference sequence in which multiple mutations are 
expected to occur within a short distance commensurate with 

30 the length of the probes (i.e., two or more mutations within 9 
to 21 bases) . Of course, the same principle can be extended 
to provide chips containing groups of probes for any number of 
reference sequences. Alternatively, the chips may contain 
additional probe (s) that do not form part of a tiled array as 

35 noted above, but rather serves as probe (s) for a conventional 
reverse dot blot. For example, the presence of mutation can 
be detected from binding of a target sequence to a single 
oligomeric probe harboring the mutation. Preferably, an 
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additional probe containing the equivalent region of the 
wildtype sequence is included as, a control. 

The chips are read by comparing the intensities of 
labelled target bound to the probes in an array. 
5 Specifically # a comparison is performed between each lane of 
probes (e.g., A, C, G and T lanes) at each columnar position 
(physical or conceptual) . For a particular columnar position, 
the lane showing the greatest hybridization signal is called 
as the nucleotide present at the position in the target 

10 sequence corresponding to the interrogation position in the 

probes. See Fig. 5. The corresponding position in the target 
sequence is that aligned with the interrogation position in 
corresponding probes when the probes and target are aligned to 
maximize complementarity* Of the four probes in a column, 

15 only one can exhibit a perfect match to the target sequence 
whereas the others usually exhibit at least a one base pair 
mismatch. The probe exhibiting a perfect match usually 
produces a substantially greater hybridization signal than the 
other three probes in the column and is thereby easily 

20 identified. However, in some regions of the target sequence, 
the distinction between a perfect match and a one-base 
mismatch is less clear. Thus, a call ratio is established to 
define the ratio of signal from the best hybridizing probes to 
the second best hybridizing probe that must be exceeded for a 

25 particular target position to be read from the probes. A high 
call ratio ensures that few if any errors are made in calling 
target nucleotides, but can result in some nucleotides being 
scored as ambiguous, which could in fact be accurately read. 
A lower call ratio results in fewer ambiguous calls, but can 

30 result in more erroneous calls. It has been found that at a 
call ratio of 1.2 virtually all calls are accurate. However, 
a small but significant number of bases (e.g., up to about 
10%) may have to be scored as ambiguous. 

Although small regions of the target sequence can 

35 sometimes be ambiguous, these regions usually occur at the 

same or similar segments in different target sequences. Thus, 
for precharacterized mutations, it is known in advance whether 
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that mutation is likely to occur within a region of 
unambiguously determinable sequence. 

An array of probes is most useful for analyzing the 
reference sequence from which the probes were designed and 
-r 5 variants of that sequence exhibiting substantial sequence 

similarity with the reference sequence (e.g., several single- 
i base mutants spaced over the reference sequence) . When an 

array is used to analyze the exact reference sequence from 
which it was designed, one probe exhibits a perfect match to 

10 the reference sequence, and the other three probes in the same 
column exhibits single-base mismatches. Thus, discrimination 
between hybridization signals is usually high and accurate 
sequence is obtained. High accuracy is also obtained when an 
array is used for analyzing a target sequence comprising a 

15 variant of the reference sequence that has a single mutation 
relative to the reference sequence, or several widely spaced 
mutations relative to the reference sequence. At different 
mutant loci, one probe exhibits a perfect match to the target, 
and the other three probes occupying the same column exhibit 

20 single-base mismatches, the difference (with respect to 

analysis of the reference sequence) being the lane in which 
the perfect match occurs. 

For target sequences showing a high degree of divergence 
from the reference strain or incorporating several closely 

25 spaced mutations from the reference strain, a single group of 
probes (i.e., designed with respect to a single reference 
sequence) will not always provide accurate sequence for the 
highly variant region of this sequence. At some particular 
columnar positions, it may be that no single probe exhibits 

30 perfect complementarity to the target and that any comparison 
must be based on different degrees of mismatch between the 
four probes. Such a comparison does not always allow the 
target nucleotide corresponding to that columnar position to 
be called. Deletions in target sequences can be detected by 

35 loss of signal from probes having interrogation positions 

encompassed by the deletion. However, signal may also be lost 
from probes having interrogation positions closely proximal to 
the deletion resulting in some regions of the target sequence 
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that cannot be read- Target sequence bearing insertions will 
also exhibit short regions including and proximal to the 
insertion that usually cannot be read. 

The presence of short regions of difficult-to-read target 
because of closely spaced mutations, insertions or deletion, 
does not prevent determination of the remaining sequence of 
the target as different regions of a target sequence are 
determined independently. Moreover, such ambiguities as might 
result from analysis of diverse variants with a single group 
of probes can be avoided by including multiple groups of probe 
sets on a chip. For example, one group of probes can be 
designed based on a full-length reference sequence, and the 
other groups on subsequences of the reference sequence 
incorporating frequently occurring mutations or strain 
variations. 

A particular advantage of the present sequencing strategy 
over conventional sequencing methods is the capacity 
simultaneously to detect and quantify proportions of multiple 
target sequences. Such capacity is valuable, e.g., for 
diagnosis of patients who are heterozygous with respect to a 
gene or who are infected with a virus, such as HIV, which is 
usually present in several polymorphic forms. Such capacity 
is also useful in analyzing targets from biopsies of tumor 
cells and surrounding tissues. The presence of multiple 
target sequences is detected from the relative signals of the 
four probes at the array columns corresponding to the target 
nucleotides at which diversity occurs. The relative signals 
at the four probes for the mixture under test are compared 
with the corresponding signals from a homogeneous reference 
sequence. An increase in a signal from a probe that is 
mismatched with respect to the reference sequence, and a 
corresponding decrease in the signal from the probe which is 
matched with the reference sequence signal the presence of a 
mutant strain in the mixture. The extent in shift in 
hybridization signals of the probes is related to the 
proportion of a target sequence in the mixture. Shifts in 
relative hybridization signals can be quantitatively related 
to proportions of reference and mutant sequence by prior 
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calibration of the chip with seeded mixtures of the mutant and 
reference sequences. By this means, a chip can be used to 
detect variant or mutant strains constituting as little as 1, 
5, 20, or 25 % of a mixture of stains. 

Similar principles allow the simultaneous analysis of 
multiple target sequences even when none is identical to the 
reference sequence. For example, with a mixture of two target 
sequences bearing first and second mutations, there would be a 
variation in the hybridization patterns of probes having 
interrogation positions corresponding to the first and second 
mutations relative to the hybridization pattern with the 
reference sequence. At each position, one of the probes 
having a mismatched interrogation position relative to the 
reference sequence would show an increase in hybridization 
signal, and the probe having a matched interrogation position 
relative to the reference sequence would show a decrease in 
hybridization signal. Analysis of the hybridization pattern 
of the mixture of mutant target sequences, preferably in 
comparison with the hybridization pattern of the reference 
sequence, indicates the presence of two mutant target 
sequences, the position and nature of the mutation in each 
strain, and the relative proportions of each strain. 

In a variation of the above method, the different 
components in a mixture of target sequences are differentially 
labelled before being applied to the array. For example, a 
variety of fluorescent labels emitting at different wavelength 
are available. The use of differential labels allows 
independent analysis of different targets bound simultaneously 
to the array. For example, the methods permit comparison of 
target sequences obtained from a patient at different stages 
of a disease. 

2. Omission of Probes 
The general strategy outlined above employs four probes 
to read each nucleotide of interest in a target sequence. One 
probe (from the first probe set) shows a perfect match to the 
reference sequence and the other three probes (from the 
second, third and fourth probe sets) exhibit a mismatch with 
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the reference sequence and a perfect match with a target 
sequence bearing a mutation at the nucleotide of interest. 
The provision of three probes from the second, third and 
fourth probe sets allows detection of each of the three 
5 possible nucleotide substitutions of any nucleotide of 

interest. However, in some reference sequences or regions of 
reference sequences, it is known in advance that only certain 
mutations are likely to occur. Thus, for example, at one site 
it might be known that an A nucleotide in the reference 

10 sequence may exist as a T mutant in some target sequences but 
is unlikely to exist as a C or G mutant. Accordingly, for 
analysis of this region of the reference sequence, one might 
include only the first and second probe sets, the first probe 
set exhibiting perfect complementarity to the reference 

15 sequence, and the second probe set having an interrogation 

position occupied by an invariant A residue (for detecting the 
T mutant) . In other situations, one might include the first, 
second and third probes sets (but not the fourth) for 
detection of a wildtype nucleotide in the reference sequence 

20 and two mutant variants thereof in target sequences. In some 
chips, probes that would detect silent mutations (i.e., not 
affecting amino acid sequence) are omitted. 

In some chips, the probes from the first probe set are 
omitted corresponding to some or all positions of the 

25 reference sequences. Such chips comprise at least two probe 
sets. The first probe set has a plurality of probes. Each 
probe comprises a segment exactly complementary to a 
subsequence of a reference sequence except in at least one 
interrogation position. A second probe set has a 

3 0 corresponding probe for each probe in the first probe set. 

The corresponding probe in the second probe set is identical 
to a sequence comprising the corresponding probe form the 
first probe set or a subsequence thereof that includes the at 
least one (and usually only one) interrogation position except 

35 that the at least one interrogation position is occupied by a 
different nucleotide in each of the two corresponding probes 
from the first and second probe sets. A third probe set, if 
present, also comprises a corresponding probe for each probe 
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in the first probe set except at the at least one 
interrogation position, which differs in the corresponding 
probes from the three sets. Omission of probes having a 
segment exhibiting perfect complementarity to the reference 

* 5 sequence results in loss of control information, i.e., the 

detection of nucleotides in a target sequence that are the 

* same as those in a reference sequence. However, similar 
information can be obtained by hybridizing a chip lacking 
probes from the first probe set to both target and reference 

10 sequences. The hybridization can be performed sequentially, 
or concurrently, if the target and reference are 
differentially labelled. In this situation, the presence of a 
mutation is detected by a shift in the background 
hybridization intensity of the reference sequence to a 

15 perfectly matched hybridization signal of the target sequence, 
rather than by a comparison of the hybridization intensities 
of probes from the first set with corresponding probes from 
the second, third and fourth sets. 

20 3. Wildtvpe Probe Lane 

When the chips comprise four probe sets, as discussed 
supra, and the probe sets are laid down in four lanes, an A 
lane, a C-lane, a G lane and a T or U lane, the probe having a 
segment exhibiting perfect complementarity to a reference 

25 sequence varies between the four lanes from one column to 

another. This does not present any significant difficulty in 
computer analysis of the data from the chip. However, visual 
inspection of the hybridization pattern of the chip is 
sometimes facilitated by provision of an extra lane of probes, 

30 in which each probe has a segment exhibiting perfect 

complementarity to the reference sequence. See Fig. 4. This 
segment is identical to a segment from one of the probes in 
the other four lanes (which lane depending on the column 
position) . The extra lane of probes (designated the wildtype 

35 lane) hybridizes to a target sequence at all nucleotide 

positions except those in which deviations from the reference 
sequence occurs. The hybridization pattern of the wildtype 
lane thereby provides a simple visual indication of mutations. 
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4. Deletion. Insertion and Multiple-Mutation Probes 
Some chips provide an additional probe set specifically 
designed for analyzing deletion mutations. The additional 
probe set comprises a probe corresponding to each probe in the 
5 first probe set as described above. However, a probe from the 
additional probe set differs from the corresponding probe in 
the first probe set in that the nucleotide occupying the 
interrogation position is deleted in the probe from the 
additional probe set. See Fig. 6. Optionally, the probe from 

10 the additional probe set bears an additional nucleotide at one 
of its termini relative to the corresponding probe from the 
first probe set. The probe from the additional probe set will 
hybridize more strongly than the corresponding probe from the 
first probe set to a target sequence having a single base 

15 deletion at the nucleotide corresponding to the interrogation 
position. Additional probe sets are provided in which not 
only the interrogation position, but also an adjacent 
nucleotide is detected. 

Similarly, other chips provide additional probe sets for 

20 analyzing insertions. For example, one additional probe set 

has a probe corresponding to each probe in the first probe set 
as described above. However, the probe in the additional 
probe set has an extra T nucleotide inserted adjacent to the 
interrogation position. See Fig. 6. Optionally, the probe 

25 has one fewer nucleotide at one of its termini relative to the 
corresponding probe from the first probe set. The probe from 
the additional probe set hybridizes more strongly than the 
corresponding probe from the first probe set to a target 
sequence having an A nucleotide inserted in a position 

3 0 adjacent to that corresponding to the interrogation position. 
Similar additional probe sets are constructed having C, G or 
T/U nucleotides inserted adjacent to the interrogation 
position. Usually, four such probe sets, one for each 
nucleotide, are used in combination. 

35 Other chips provide additional probes (multiple-mutation 

probes) for analyzing target sequences having multiple closely 
spaced mutations. A multiple-mutation probe is usually 
identical to a corresponding probe from the first set as 
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described above, except in the base occupying the 
interrogation position, and except at one or more additional 
positions, corresponding to nucleotides in which substitution 
may occur in the reference sequence. The one or more 
1 5 additional positions in the multiple mutation probe are 

occupied by nucleotides complementary to the nucleotides 
« occupying corresponding positions in the reference sequence 

when the possible substitutions have occurred. 
5. Block Tiling 

10 As noted in the discussion of the general tiling 

strategy, a probe in the first probe set sometimes has more 
than one interrogation position. In this situation, a probe 
in the first probe set is sometimes matched with multiple 
groups of at least one, and usually, three additional probe 

15 sets. See Fig. 7. Three additional probe sets are used to 

allow detection of the three possible nucleotide substitutions 
at any one position. If only certain types of substitution 
are likely to occur (e.g., transitions), only one or two 
additional probe sets are required (analogous to the use of 

20 probes in the basic tiling strategy) . To illustrate for the 

situation where a group comprises three additional probe sets, 
a first such group comprises second, third and fourth probe 
sets, each of which has a probe corresponding to each probe in 
.the first probe set. The corresponding probes from the 

25 second, third and fourth probes sets differ from the 

corresponding probe in the first set at a first of the 
interrogation positions. Thus, the relative hybridization 
signals from corresponding probes from the first, second, 
third and fourth probe sets indicate the identity of the 

30 nucleotide in a target sequence corresponding to the first 
interrogation position. A second group of three probe sets 
(designated fifth, sixth and seventh probe sets) , each also 
have a probe corresponding to each probe in the first probe 
set. These corresponding probes differ from that in the first 

35 probe set at a second interrogation position. The relative 
hybridization signals from corresponding probes from the 
first, fifth, sixth, and seventh probe sets indicate the 
identity of the nucleotide in the target sequence 
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corresponding to the second interrogation position. As noted 
above, the probes in the first probe set often have seven or 
more interrogation positions. If there are seven 
interrogation positions, there are seven groups of three 
additional probe sets, each group of three probe sets serving 
to identify the nucleotide corresponding to one of the seven 
interrogation positions. 

Each block of probes allows short regions of a target 
sequence to be read. For example, for a block of probes 
having seven interrogation positions, seven nucleotides in the 
target sequence can be read. Of course, a chip can contain 
any number of blocks depending on how many nucleotides of the 
target are of interest. The hybridization signals for each 
block can be analyzed independently of any other block. The 
block tiling strategy can also be combined with other tiling 
strategies, with different parts of the same reference 
sequence being tiled by different strategies. 

The block tiling strategy offers two advantages over the 
basic strategy in which each probe in the first set has a 
single interrogation position. One advantage is that the same 
sequence information can be obtained from fewer probes. A 
second advantage is that each of the probes constituting a 
block (i.e., a probe from the first probe set and a 
corresponding probe from each of the other probe sets) can 
have identical 3' and 5' sequences, with the variation 
confined to a central segment containing the interrogation 
positions. The identity of 3" sequence between different 
probes simplifies the strategy for solid phase synthesis of 
the probes on the chip and results in more uniform deposition 
of the different probes on the chip, thereby in turn 
increasing the uniformity of signal to noise ratio for 
different regions of the chip. A third advantage is that 
greater signal uniformity is achieved within a block. 

6. Multiplex Tiling 
In the block tiling strategy discussed above, the 
identity of a nucleotide in a target or reference sequence is 
determined by comparison of hybridization patterns of one 
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probe having a segment showing a perfect match with that of 
other probes (usually three other probes) showing a single 
base mismatch. In multiplex tiling, the identity of at least 
two nucleotides in a reference or target sequence is 
determined by comparison of hybridization signal intensities 
of four probes, two of which have a segment showing perfect 
complementarity or a single base mismatch to the reference 
sequence, and two of which have a segment showing perfect 
complementarity or a double-base mismatch to a segment. The 
four probes whose hybridization patterns are to be compared 
each have a segment that is exactly complementary to a 
reference sequence except at two interrogation positions, in 
which the segment may or may not be complementary to the 
reference sequence. The interrogation positions correspond to 
the nucleotides in a reference or target sequence which are 
determined by the comparison of intensities. The nucleotides 
occupying the interrogation positions in the four probes are 
selected according to the following rule. The first 
interrogation position is occupied by a different nucleotide 
in each of the four probes. The second interrogation position 
is also occupied by a different nucleotide in each of the four 
probes. In two of the four probes, designated the first and 
second probes, the segment is exactly complementary to the 
reference sequence except at not more than one of the two 
interrogation positions. In other words, one of the 
interrogation positions is occupied by a nucleotide that is 
complementary to the corresponding nucleotide from the 
reference sequence and the other interrogation position may or 
may not be so occupied. In the other two of the four probes, 
designated the third and fourth probes, the segment is exactly 
complementary to the reference sequence except that both 
interrogation positions are occupied by nucleotides which are 
noncomplementary to the respective corresponding nucleotides 
in the reference sequence. 

There are number of ways of satisfying these conditions 
depending on whether the two nucleotides in the reference 
sequence corresponding to the two interrogation positions are 
the same or different. If these two nucleotides are different 
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in the reference sequence (probability 3/4) , the conditions 
are satisfied by each of the two interrogation positions being 
occupied by the same nucleotide in any given probe. For 
example, in the first probe, the two interrogation positions 
would both be A, in the second probe, both would be C, in the 
third probe, each would be G, and in the fourth probe each 
would be T or U. If the two nucleotides in the reference 
sequence corresponding to the two interrogation positions are 
different, the conditions noted above are satisfied by each of 
the interrogation positions in any one of the four probes 
being occupied by complementary nucleotides. For example, in 
the first probe, the interrogation positions could be occupied 
by A and T, in the second probe by C and G, in the third probe 
by G and C, and in the four probe, by T and A. See (Fig. 8). 

When the four probes are hybridized to a target that is 
the same as the reference sequence or differs from the 
reference sequence at one (but not both) of the interrogation 
positions, two of the four probes show a double-mismatch with 
the target and two probes show a single mismatch. The 
identity of probes showing these different degrees of mismatch 
can be determined from the different hybridization signals. 
From the identity of the probes showing the different degrees 
of mismatch, the nucleotides occupying both of the 
interrogation positions in the target sequence can be deduced. 

For ease of illustration, the multiplex strategy has been 
initially described for the situation where there are two 
nucleotides of interest in a reference sequence and only four 
probes in an array. Of course, the strategy can be extended 
to analyze any number of nucleotides in a target sequence by 
u6ing additional probes. In one variation, each pair of 
interrogation positions is read from a unique group of four 
probes. In a block variation, different groups of four probes 
exhibit the same segment of complementarity with the reference 
sequence, but the interrogation positions move within a block. 
The block and standard multiplex tiling variants can of course 
be used in combination for different regions of a reference 
sequence. Either or both variants can also be used in 
combination with any of the other tiling strategies described. 
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7. Helper Mutations 

Occasionally small regions of a reference sequence give a 
low hybridization signal as a result of annealing of probes. 
The self -annealing reduces the amount of probe effectively 
5 available for hybridizing to the target. Although such 

regions of the target are generally small and the reduction of 
hybridization signal is usually not so substantial as to 
obscure the sequence of this region, this concern can be 
avoided by the use of probes incorporating helper mutations. 

10 The helper mutation (s) serve to break-up regions of internal 
complementarity within a probe and thereby prevent annealing. 
Usually, one or two helper mutations are quite sufficient for 
this purpose. The inclusion of helper mutations can be 
beneficial in any of the tiling strategies noted above. In 

15 general each probe having a particular interrogation position 
has the same helper mutation (s) . Thus, such probes have a 
segment in common which shows perfect complementarity with a 
reference sequence, except that the segment contains at least 
one helper mutation (the same in each of the probes) and at 

20 least one interrogation position (different in all of the 

probes) . For example, in the basic tiling strategy, a probe 
from the first probe set comprises a segment containing an 
interrogation position and showing perfect complementarity 
with a reference sequence except for one or two helper 

25 mutations. The corresponding probes from the second, third 
and fourth probe sets usually comprise the same segment (or 
sometimes a subsequence thereof including the helper 
mutation (s) and interrogation position) , except that the base 
occupying the interrogation position varies in each probe. 

30 See Fig. 9. 

Usually, the helper mutation tiling strategy is used in 
conjunction with one of the tiling strategies described above. 
The probes containing helper mutations are used to tile 
regions of a reference sequence otherwise giving low 

35 hybridization signal (e.g., because of self-complementarity), 
and the alternative tiling strategy is used to tile 
intervening regions. 
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8, Pooling Strategies 

Pooling strategies also employ arrays of immobilized 
probes • Probes are immobilized in cells of an array, and the 
hybridization signal of each cell can be determined 
5 independently of any other cell. A particular cell may be 
occupied by pooled mixture of probes. Although the identity 
of each probe in the mixture is known, the individual probes 
in the pool are not separately addressable. Thus, the 
hybridization signal from a cell is the aggregate of that of 

10 the different probes occupying the cell. In general, a cell 
is scored as hybridizing to a target sequence if at least one 
probe occupying the cell comprises a segment exhibiting 
perfect complementarity to the target sequence. 

A simple strategy to show the increased power of pooled 

15 strategies over a standard tiling is to create three cells 
each containing a pooled probe having a single pooled 
position, the pooled position being the same in each of the 
pooled probes. At the pooled position, there are two possible 
nucleotide, allowing the pooled probe to hybridize to two 

20 target sequences. In tiling terminology, the pooled position 
of each probe is an interrogation position. As will become 
apparent, comparison of the hybridization intensities of the 
pooled probes from the three cells reveals the identity of the 
nucleotide in the target sequence corresponding to the 

25 interrogation position (i.e., that is matched with the 

interrogation position when the target sequence and pooled 
probes are maximally aligned for complementarity) . 

The three cells are assigned probe pools that are 
perfectly complementary to the target except at the pooled 

30 position, which is occupied by a different pooled nucleotide 
in each probe as follows: 
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[AC] = M, [GT]=K, [AG]=R 

as substitutions in the probe 

IUPAC standard ambiguity notation) 

X - interrogation position 
Target : TAACCACTCACGGGAGCA 

Pool 1: ATTGGMGAGTGCCC 
=ATTGGaGAGTGCCC 
+ATTGGCGAGTGCCC 

Pool 2: ATTGGKG AGTG CCC 
=ATTGGgGAGTGCCC 
+ATTGGtGAGTGCCC 

Pool 3: ATTGGRGAGTGCCC 
=ATTGGaGAGTGCCC 
+ATTGGgGAGTGCCC 



(complement to mutant 't') 
(complement to mutant 'g') 



(complement to mutant 1 c') 
(complement to wild type 'a 1 ) 



(complement to mutant 't') 
(complement to mutant ■ c') 



20 With 3 pooled probes, all 4 possible single base pair states 
(wild and 3 mutants) are detected. A pool hybridizes with a 
target if some probe contained within that pool is 
complementary to that target. 



25 



30 



Pool: 
Target: 
Mutant : 
Mutant : 
Mutant: 



TAACCACTCACGGGAGCA 
TAACCcCTCACGGGAGCA 
TAACCgCTCACGGGAGCA 
TAACCtCTCACGGGAGCA 



Hybridization? 

12 3 

n y n 

n y y 

y n n 

y n y 
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A cell containing a pair (or more) of oligonucleotides 
lights up when a target complementary to any of the 
oligonucleotide in the cell is present. Using the simple 
strategy, each of the four possible targets (wild and three 
mutants) yields a unique hybridization pattern among the three 
cells. 

Since a different pattern of hybridizing pools is 
obtained for each possible nucleotide in the target sequence 
corresponding to the pooled interrogation position in the 
probes, the identity of the nucleotide can be determined from 
the hybridization pattern of the pools. Whereas, a standard 
tiling requires four cells to detect and identify the possible 
single-base substitutions at one location, this simple pooled 
strategy only requires three cells. 
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A more efficient pooling strategy for sequence analysis 
is the •Trellis' strategy. In this strategy, each pooled 
probe has a segment of perfect complementarity to a reference 
sequence except at three pooled positions. One pooled 
5 position is an N pool. The three pooled positions may or may ' 
not be contiguous in a probe. The other two pooled positions 
are selected from the group of three pools consisting of (1) M 
or K, (2) R or Y and (3) W or S, where the single letters are 
IUPAC standard ambiguity codes. The sequence of a pooled 

10 probe is thus, of the form XXXN[ (M/K) or (R/Y) or (W/S) ] [ (M/K) 
or (R/Y) or (W/S) ]XXXXX, where XXX represents bases 
complementary to the reference sequence. The three pooled 
positions may be in any order, and may be contiguous or 
separated by intervening nucleotides. For, the two positions 

15 occupied by [(M/K) or (R/Y) or (W/S)], two choices must be 

made. First, one must select one of the following three pairs 
of pooled nucleotides (1) M/K, (2) R/Y and (3) W/S. The one 
of three pooled nucleotides selected may be the same or 
different at the two pooled positions. Second, supposing, for 

20 example, one selects M/K at one position, one must then chose 
between M or K. This choice should result in selection of a 
pooled nucleotide comprising a nucleotide that complements the 
corresponding nucleotide in a reference sequence, when the 
probe and reference sequence are maximally aligned. The same 

25 principle governs the selection between R and Y, and between W 
and S. A trellis pool probe has one pooled position with four 
possibilities, and two pooled positions, each with two 
possibilities. Thus, a trellis pool probe comprises a mixture 
of 16 (4x2x2) probes. Since each pooled position includes 

3 0 one nucleotide that complements the corresponding nucleotide 
from the reference sequence, one of these 16 probes has a 
segment that is the exact complement of the reference 
sequence. A target sequence that is the same as the reference ^ 
sequence (i.e., a wildtype target) gives a hybridization 

35 signal to each probe cell. Here, as in other tiling methods, 
the segment of complementarity should be sufficiently long to 
permit specific hybridization of a pooled probe to a reference 
sequence be detected relative to a variant of that reference 
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sequence. Typically, the segment of complementarity is about 
9-21 nucleotides. 

A target sequence is analyzed by comparing hybridization 
intensities at three pooled probes, each having the structure 
t 5 described above. The segments complementary to the reference 

sequence present in the three pooled probes show some overlap. 

* Sometimes the segments are identical (other than at the 

* interrogation positions) . However, this need not be the case. 
For example, the segments can tile across a reference sequence 

10 in increments of one nucleotide (i.e., one pooled probe 

differs from the next by the acquisition of one nucleotide at 
the 5* end and loss of a nucleotide at the 3* end). The three 
interrogation positions may or may not occur at the same 
relative positions within each pooled probe (i.e., spacing 

15 from a probe terminus) . All that is required is that one of 
the three interrogation positions from each of the three 
pooled probes aligns with the same nucleotide in the reference 
sequence, and that this interrogation position is occupied by 
a different pooled nucleotide in each of the three probes. In 

20 one of the three probes, the interrogation position is 
occupied by an N. In the other two pooled probes the 
interrogation position is occupied by one of (M/K) or (R/Y) or 
(W/S). 

In the simplest form of the trellis strategy, three 
25 pooled probes are used to analyze a single nucleotide in the 
reference sequence. Much greater economy of probes is 
achieved when more pooled probes are included in an array. 
For example, consider an array of five pooled probes each 
having the general structure outlined above. Three of these 
30 pooled probes have an interrogation position that aligns with 
the same nucleotide in the reference sequence and are used to 
read that nucleotide. A different combination of three probes 
r have an interrogation position that aligns with a different 

nucleotide in the reference sequence. Comparison of these 
35 three probe intensities allows analysis of this second 

nucleotide. Still another combination of three pooled probes 
from the set of five have an interrogation position that 
aligns with a third nucleotide in the reference sequence and 
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these probes are used to analyze that nucleotide. Thus, three 
nucleotides in the reference sequence are fully analyzed from 
only five pooled probes. By comparison, the basic tiling 
strategy would require 12 probes for a similar analysis. 
5 As an example, a pooled probe for analysis of a target 

sequence by the trellis strategy is shown below: 

Target : ATTAACCACTCACGGGAGCTCT 
Pool : TGGTGNKYGCCCT 

10 

The pooled probe actually comprises 16 individual probes: 

TGGTGAGcGCCCT 

+TGGTGcGcGCCCT 
15 +TGGTGgGcGCCCT 

+TGGTGtGcGCCCT 

+TGGTGAtcGCCCT 

+TGGTGctcGCCCT 

+TGGTGgtcGCCCT 
20 +TGGTGttcGCCCT 

+TGGTGAGTGCCCT 

+TGGTGCGTGCCCT 

+TGGTGgGTGCCCT 

+TGGTGtGTGCCCT 
25 +TGGTGAtTGCCCT 

+TGGTG c tTG CCCT 

+TGGTGgtTGCCCT 

+TGGTGttTGCCCT 

30 

The trellis strategy employs an array of probes having at 
least three cells, each of which is occupied by a pooled probe 
as described above. 

Consider the use of three such pooled probes for 

35 analyzing a target sequence, of which one position may contain 
any single base substitution to the reference sequence (i.e, 
there are four possible target sequences to be distinguished) . 
Three cells are occupied by pooled probes having a pooled 
interrogation position corresponding to the position of 

4 0 possible substitution in the target sequence, one cell with an 
•N', one cell with one of 'M' or 'K 1 , and one cell with *R f or 
•Y 1 . An interrogation position corresponds to a nucleotide in 
the target sequence if it aligns adjacent with that nucleotide 
when the probe and target sequence are aligned to maximize 

45 complementarity. Note that although each of the pooled 
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probes has two other pooled positions, these positions are not 
relevant for the present illustration. The positions are only 
relevant when more than one position in the target sequence is 
to be read, a circumstance that will be considered later. For 
present purposes, the cell with the 'N ' in the interrogation 
position lights up for the wildtype sequence and any of the 
three single base substitutions of the target sequence. The 
cell with M/K in the interrogation position lights up for the 
wildtype sequence and one of the single-base substitutions. 
The cell with R/Y in the interrogation position lights up for 
the wildtype sequence and a second of the single-base 
substitutions. Thus, the four possible target sequences 
hybridize to the three pools of probes in four distinct 
patterns, and the four possible target sequences can be 
distinguished . 

To illustrate further, consider four possible target 
sequences (differing at a single position) and a pooled probe 
having three pooled positions, N, K and Y with the Y position 
as the interrogation position (i.e., aligned with the variable 
position in the target sequence) : 
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Target 

Wild: ATTAACCACTCACGGGAGCTCT (w) 

Mutants : ATTAACCACTCcCGGGAGCTCT ( c ) 
Mutants : ATTAACCACTCgCGGGAGCTCT ( g ) 
Mutants : ATTAACCACTCtCGGGAGCTCT ( t ) 

TGGTGNKYGCCCT (pooled probe) . 

The sixteen individual component probes of the pooled probe 
hybridize to the four possible target sequences as follows: 

TARGET 





w 


c 


g 


t 


TGGTGAGcGCCCT 


n 


n 


y 


n 


TGGTGcGcGCCCT 


n 


n 


n 


n 


TGGTGgGcGCCCT 


n 


n 


n 


n 


TGGTGtGcGCCCT 


n 


n 


n 


n 


TGGTGAtcGCCCT 


n 


n 


n 


n 


TGGTGctcGCCCT 


n 


n 


n 


n 


TGGTGgtcGCCCT 


n 


n 


n 


n 


TGGTGttcGCCCT 


n 


n 


n 


n 


TGGTGAGTGCCCT 


y 


n 


n 


n 


TGGTGcGTGCCCT 


n 


n 


n 


n 


TGGTGgGTGCCCT 


n 


n 


n 


n 


TGGTGtGTGCCCT 


n 


n 


n 


n 


TGGTGAtTGCCCT 


n 


n 


n 


n 


TGGTGctTGCCCT 


n 


n 


n 


n 


TGGTGgtTGCCCT 


n 


n 


n 


n 


TGGTGttTGCCCT 


n 


n 


n 


n 



The pooled probe hybridizes according to the aggregate of its 
components : 

Pool: TGGTGNKYGCCCT y n y n 



Thus, as stated above, it can be seen that a pooled probe 
having a y at the interrogation position hybridizes to the 
wildtype target and one of the mutants. Similar tables can be 
drawn to illustrate the hybridization patterns of probe pools 
having other pooled nucleotides at the interrogation position. 

The above strategy of using pooled probes to analyze a 
single base in a target sequence can readily be extended to 
analyze any number of bases. At this point, the purpose of 
including three pooled positions within each probe will become 
apparent. In the example that follows, ten pools of probes, 
each containing three pooled probe positions, can be used to 
analyze a each of a contiguous sequence of eight nucleotides 
in a target sequence. 
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ATTAACCACTCACGGGAGCTCT Reference sequence 
Readable nucleotides 

Pools: 



4 TAATTNKYGAGTG 

5 AATTGNKRAGTGC 

6 ATTGGNKRGTGCC 

7 TTG GTNMRTG CCC 

8 TGGTGNKYGCCCT 

9 GGTGANKRCCCTC 

10 GTGAGNKYCCTCG 

11 TGAGTNMYCTCGA 

12 GAGTGNMYTCGAG 

13 AGTGCNMYCGAGA 



In this example, the different pooled probes tile across 
the reference sequence, each pooled probe differing from the 
next by increments of one nucleotide. For each of the 
readable nucleotides in the reference sequence, there are 
three probe pools having a pooled interrogation position 
aligned with the readable nucleotide. For example, the 12th 
nucleotide from the left in the reference sequence is aligned 
with pooled interrogation positions in pooled probes 8, 9, and 
10. Comparison of the hybridization intensities of these 
pooled probes reveals the identity of the nucleotide occupying 



position 


12 in a target sequence. 




Pools 






Targets 


8 


9 


10 


Wild: 


ATTAACCACTCACGGGAGCTCT 


Y 


Y 


Y 


Mutants : 


ATTAACCACTCcCGGGAGCTCT 


N 


Y 


Y 


Mutants : 


ATTAACCACTCgCGGGAGCTCT 


Y 


N 


Y 


Mutants : 


ATTAACCACTCtCGGGAGCTCT 


N 


N 


Y 



Example Intensities: 





- lit cell 


Wild 












- blank cell 


• c« 




























1 IJ» 1 














None 











Thus, for example, if pools 8, 9 and 10 all light up, one 
knows the target sequence is wildtype, If pools, 9 and 10 
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light up, the target sequence has a C mutant at position 12. 
If pools 8 and 10 light up, the -target sequence has a G mutant 
at position 12. If only pool 10 lights up, the target 
sequence has a t mutant at position 12. 



is determined by a comparison of other sets of three pooled 
probes. For example, the identity of the 13th nucleotide in 
the target sequence is determined by comparing the 
hybridization patterns of the probe pools designated 9, 10 and 

10 11. Similarly, the identity of the 14th nucleotide in the 

target sequence is determined by comparing the hybridization 
patterns of the probe pools designated 10, 11, and 12. 

In the above example, successive probes tile across the 
reference sequence in increments of one nucleotide, and each 

15 probe has three interrogation positions occupying the same 

positions in each probe relative to the terminus of the probe 
(i.e., the 7, 8 and 9th positions relative to the 3' 
terminus) . However, the trellis strategy does not require 
that probes tile in increments of one or that the 

20 interrogation position positions occur in the same position in 
each probe. In a variant of trellis tiling referred to as 
"loop" tiling, a nucleotide of interest in a target sequence 
is read by comparison of pooled probes, which each have a 
pooled interrogation position corresponding to the nucleotide 

25 of interest, but in which the spacing of the interrogation 
position in the probe differs from probe to probe. 
Analogously to the block tiling approach, this allows several 
nucleotides to be read from a target sequence from a 
collection of probes that are identical except at the 

30 interrogation position. The identity in sequence of probes, 
particularly at their 3' termini, simplifies synthesis of the 
array and result in more uniform probe density per cell. 

To illustrate the loop strategy, consider a reference 
sequence of which the 4, 5, 6, 7 and 8th nucleotides (from the 

35 3* termini are to be read. All of the four possible 

nucleotides at each of these positions can be read from 
comparison of hybridization intensities of five pooled probes. 
Note that the pooled positions in the probes are different 



5 



The identity of other nucleotides in the target sequence 



T 
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(for example in probe 55, the pooled positions are 4, 5 and 6 
and in probe 56, 5, 6 and 7). 



Each position of interest in the reference sequence is read by 
comparing hybridization intensities for the three probe pools 
that have an interrogation position aligned with the 
nucleotide of interest in the reference sequence. For 
example, to read the fourth nucleotide in the reference 
sequence, probes 55, 58 and 59 provide pools at the fourth 
position. Similarly, to read the fifth nucleotide in the 
reference sequence, probes 55, 56 and 59 provide pools at the 
fifth position. As in the previous trellis strategy, one of 
the three probes being compared has an N at the pooled 
position and the other two have M or K, and (2) R or Y and (3) 
W or S. 

The hybridization pattern of the five pooled probes to 
target sequences representing each possible nucleotide 
substitution at five positions in the reference sequence is 
shown below. Each possible substitution results in a unique 
hybridization pattern at three pooled probes, and the identity 
of the nucleotide at that position can be deduced from the 
hybridization pattern. 



TAACCACTCACGGGAGCA Reference sequence 



55 
56 
57 
58 
59 



ATTNKYGAGTGCC 
ATTGNKRAGTGCC 
ATTGGNKRGTGCC 
ATTRGTNMGTGCC 
ATTKRTGNGTGCC 
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Pools 



5 





Targets 


55 


56 


57 


58 


59 


wild ; 


TAACCAC 1 CAUGGGAGL.A 


v 


v 

X 


v 
i 


v 

X 


v 

X 


Mutant : 


TAAgCACTCACGGGAGCA 


y 


N 


N 


N 


N 


Mutant : 


TAAtCACTCACGGGAGCA 


I 


XT 

N 


XT 
N 


Y 


XT 

N 


Mutant : 


TAAaCACTCACGGGAGv-A 


v 


XT 


XI 
Si 


XT 


v 


Mutant : 


TAACgACTCACGGGAGCA 


N 


Y 


N 


N 


N 


Mutant : 


TAACuACTCACGGGAGl-A 


XT 
W 


v 
i 


XT 
V* 


XT 

JM 


v 


Mutant : 


TAACaACTCACGGGAGCA 


v 
I 


v 
i 


XT 


XT 


XT 

N 


Mutant : 


TAACCcCTCACGGGAGCA 


N 


Y 


Y 


N 


N 


Mutant : 


TAACCg CTCACGGG AG CA 


v 
X 


XT 

si 


v 
I 


XT 

N 


XT 


Mutant : 


TAACCtCTCACGGGAGCA 


XT 

Si 


XI 
Si 


v 

X 


XI 


IT 

N 


Mutant : 


TAACCAgTCACGGGAGCA 


N 


N 


N 


Y 


N 


Mutant : 


TAACCAtTCACGGGAGCA 


N 


Y 


N 


Y 


N 


Mutant : 


TAACCAaTCACGGGAGCA 


N 


N 


Y 


Y 


N 


Mutant : 


TAACCACaCACGGGAGCA 


N 


N 


N 


N 


Y 


Mutant: 


TAACCACcCACGGGAGCA 


N 


N 


Y 


N 


Y 


Mutant: 


TAACCACgCACGGGAGCA 


N 


N 


N 


Y 


Y 



25 

Many variations on the loop and trellis tilings can be 
created. All that is required is that each position in 
sequence roust have a probe with a 'N', a probe containing one 

30 of R/Y, M/K or W/S, and a probe containing a different pool 
from that set, complementary to the wild type target at that 
position, and at least one probe with no pool at all at that 
position. This combination allows all mutations at that 
position to be uniquely detected and identified. 

35 A further class of strategies involving pooled probes are 

termed coding strategies. These strategies assign code words 
from some set of numbers to variants of a reference sequence. 
Any number of variants can be coded. The variants can include 
multiple closely spaced substitutions, deletions or 

40 insertions. The designation letters or other symbols assigned 
to each variant may be any arbitrary set of numbers, in any 
order. For example, a binary code is often used, but codes to 
other bases are entirely feasible. The numbers are often 
assigned such that each variant has a designation having at 

4 5 least one digit and at least one nonzero value for that digit. 
For example, in a binary system, a variant assigned the number 
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101, has a designation of three digits, with one possible 
nonzero value for each digit . 

The designation of the variants are coded into an array 
of pooled probes comprising a pooled probe for each nonzero 
-k 5 value of each digit in the numbers assigned to the variants. 

For example, if the variants are assigned successive number in 
a numbering system of base m, and the highest number assigned 
to a variant has n digits, the array would have about n x (m- 
1) pooled probes. In general, log m (3N+1) probes are required 

10 to analyze all variants of N locations in a reference 

sequence , each having three possible mutant substitutions. 
For example, 10 base pairs of sequence may be analyzed with 
only 5 pooled probes using a binary coding system. 
Each pooled probe has a segment exactly complementary to the 

15 reference sequence except that certain positions are pooled. 
The segment should be sufficiently long to allow specific 
hybridization of the pooled probe to the reference sequence 
relative to a mutated form of the reference sequence. As in 
other tiling strategies, segments lengths of 9-21 nucleotides 

2 0 are typical. Often the probe has no nucleotides other than 
the 9-21 nucleotide segment. The pooled positions comprise 
nucleotides that allow the pooled probe to hybridize to every 
variant assigned a particular nonzero value in a particular 
digit. Usually, the pooled positions further comprises a 

25 nucleotide that allows the pooled probe to hybridize to the 
reference sequence. Thus, a wildtype target (or reference 
sequence) is immediately recognizable from all the pooled 
probes being lit. 

When a target is hybridized to the pools, only those 

30 pools comprising a component probe having a segment that is 

exactly complementary to the target light up. The identity of 
the target is then decoded from the pattern of hybridizing 
T pools. Each pool that lights up is correlated with a 

particular value in a particular digit. Thus, the aggregate 

35 hybridization patterns of each lighting pool reveal the value 
of each digit in the code defining the identity of the target 
hybridized to the array. 
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As an example, consider a reference sequence having four 
positions, each of which can be -occupied by three possible 
mutations. Thus, in total there are 4x3 possible variant 
forms of the reference sequence. Each variant is assigned a 
binary number binary numbers 0001-1100 and the wildtype 
reference sequence is assigned the binary number 1111. 



10 



15 



Positions 

Target: TAAC 0=1111 
CACGGGAGCA 

G=0001 
T=0101 
A=1001 



X 

A=llll 

C=0010 
G=0110 
T=1010 



X 

C=llll 

G=0011 
T-0111 
A=1011 



X 

T=llll 

A=0100 
C=1000 
G=1100 



A first pooled probe is designed by including probes that 
complement exactly each variant having a 1 in the first digit. 



20 



25 



30 



target 
Mutant 
Mutant 
Mutant 
Mutant 
Mutant 
Mutant 



(HID 
(0001) 
(0101) 
(1001) 
(0011) 
(0111) 
(1101) 



TAAC 
TAAC 
TAAC 
TAAC 
TAAC 
TAAC 
TAAC 



First pooled probe 
ATTG 
ATTG 



C 
C 
C 



[GCAT] 
N 



C 
C 

c 
c 



A 
A 
A 
A 
A 
A 
A 



T [GCAT] 
T N 



T CACGGGAGCA 
T CACGGGAGCA 
T CACGGGAGCA 
T CACGGGAGCA 
T CACGGGAGCA 
T CACGGGAGCA 
T CACGGGAGCA 



A GTGCCC 
A GTGCCC 



35 



40 



Second, third and fourth pooled probes are then designed 
respectively including component probes that hybridize to each 
variant having a 1 in the second, third and fourth digit. 

XXXX - 4 positions examined 



Target: 
Pool 1(1) 
Pool 2(2) 
Pool 3(4) 
Pool 4(8) 



TAACCACTCACGGGAGCA 
ATTGnTnAGTGCCC = 
ATTGGnnAGTGCCC = 
ATTGyrydGTGCCC = 
ATTGmvmbGTGCCC = 



16 probes 

16 probes 

24 probes 

24 probes 



(4x1x4x1) 
(1x4x4x1) 
(2x2x2x3 ) 
(2x2x2x3) 
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The pooled probes hybridize to variant targets as follows; 
Hybridization pattern: 









Pools 






Targets 


1 


2 


3 


4 


Wild(llll) 


TAACCACTCACGGGAGCA 


Y 


Y 


Y 


Y 


Mutant (0001) : 


TAACgACTCACGGGAGCA 


Y 


N 


H 


N 


Mutant/ 0101) : 


TAACtACTCACGGGAGCA 


Y 


N 


Y 


N 


Mutant (1001) : 


TAACaACTCACGGGAGCA 


Y 


N 


N 


Y 


Mutant (0010) : 


TAACC C CT C ACGGG AG C A 


N 


Y 


N 


N 


Mutant (0110) : 


TAACCgCTCACGGGAGCA 


N 


Y 


Y 


N 


Mutant (1010) : 


TAACCtCTCACGGGAGCA 


N 


Y 


N 


Y 


Mutant (0011) : 


TAACCAgTCACGGGAGCA 


Y 


Y 


N 


N 


Mutant (0111) : 


TAACCAtTCACGGGAGCA 


Y 


Y 


Y 


N 


Mutant (1101) : 


TAACCAaTCACGGGAGCA 


Y 


N 


Y 


Y 


Mutant (0100) : 


TAACCACaCACGGGAGCA 


N 


N 


Y 


N 


Mutant (1000) : 


T AACC ACc C A CGGG AG CA 


N 


N 


N 


Y 


Mutant (1100) : 


TAACCACgCACGGGAGCA 


N 


N 


Y 


Y 



25 



30 



35 



40 



45 



The identity of a variant (i.e., mutant) target is read 
directly from the hybridization pattern of the pooled probes. 
For example the mutant assigned the number 0001 gives a 
hybridization pattern of NNNY with respect to probes 4, 3, 2 
and 1 respectively. 

In the above example, variants are assigned successive 
numbers in a numbering system. In other embodiments, sets of 
numbers can be chosen for their properties. If the codewords 
are chosen from an error-control code, the properties of that 
code carry over to sequence analysis. An error code is a 
numbering system in which some designations are assigned to 
variants and other designations serve to indicate errors that 
may have occurred in the hybridization process. For example, 
if all codewords have an odd number of nonzero digits ('binary 
coding+error detection')/ any single error in hybridization 
will be detected by having an even number of pools lit. 



Wild 
Target : 

Pool 1(1) 

Pool 2(2) 

Pool 3(4) 

Pool 4(8) 



TAACCACTCACGGGAGCA 

ATTGnAnAGTGCCC = 
ATTGGnnAGTGCCC = 
ATTGryrhGTGCCC = 
ATTGkwkvGTGCCC = 



16 Probes 

16 Probes 

24 Probes 

24 Probes 



(4xlx4xl) 
(1X4X4X1) 
(2X2X2X3) 
(2X2X2X3) 
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A fifth probe can be added to make the number of pools that 
hybridize to any single mutation odd. 





Pool 5(c): ATTGdhsmGTGCCC 


36 probes 




(2x2x3x3) 




5 


Hybridization 


of pooled probes to 


targets 




















Pool 










Target 


l 


2 


3 


4 


5 


10 


Target(lllll) 


: TAACCACTCACGGGAGCA 


Y 


Y 


Y 


Y 


Y 




Mutant(OOOOl) 


: TAACgACTCACGGGAGCA 


v 


N 


N 


N 


N 




Mutant (10101) 


: TAACtACTCACGGGAGCA 


Y 


N 


N 


N 


N 




Mutant (11001) 


I TAALSAL 1 LAUbbbAbLA 


Y 


N 


M 
H 


v 

X 


Y 


15 


Mutant(00010) 


: TAACCcCTCACGGGAGCA 


N 


Y 


N 


N 


N 




Mutant (10110) 


: TAACCgCTCACGGGAGCA 


N 


Y 


Y 


N 


Y 




Mutant (11010) 


: TAACCtCTCACGGGAGCA 


N 


Y 


N 


Y 


Y 




Mutant (10011) 


: TAACCAgTCACGGGAGCA 


Y 


Y 


N 


N 


Y 


20 


Mutant (00 111) 


: TAACAtTCACGGGAGCA 


Y 


Y 


Y 


N 


N 




Mutant(OllOl) 


: TAACCAaTCACGGGAGCA 


Y 


N 


Y 


Y 


N 




Mutant (00100) 


: TAACCACaCACGGGAGCA 


N 


N 


Y 


N 


N 




Mutant (01000) 


: TAACCAcCCACGGGAGCA 


N 


N 


N 


Y 


N 


25 


Mutant (11100) 


: TAACCACgCACGGGAGCA 


N 


N 


Y 


Y 


Y 



9. Bridging Strategy 

Probes that contain partial matches to two separate 

30 (i.e., non contiguous) subsequences of a target sequence 
sometimes hybridize strongly to the target sequence. In 
certain instances, such probes have generated stronger signals 
than probes of the same length which are perfect matches to 
the target sequence. It is believed (but not necessary to the 

35 invention) that this observation results from interactions of- 
a single target sequence with two or more probes 
simultaneously. This invention exploits this observation to 
provide arrays of probes having at least first and second 
segments, which are respectively complementary to first and 

40 second subsequences of a reference sequence. Optionally, the 

probes may have a third or more complementary segments. These : 
probes can be employed in any of the strategies noted above. 
The two segments of such a probe can be complementary to * 
disjoint subsequences of the reference sequences or contiguous 

45 subsequences. If the latter, the two segments in the probe 
are inverted relative to the order of the complement of the 
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reference sequence. The two subsequences of the reference 
sequence each typically comprises about 3 to 3 0 contiguous 
nucleotides. The subsequences of the reference sequence are 
sometimes separated by 0, 1, 2 or 3 bases. Often the 
5 sequences, are adjacent and nonover lapping. 

For example, a wild-type probe is created by 
complementing two sections of a reference sequence (indicated 
by subscript and superscript) and reversing their order. The 
interrogation position is designated (*) and is apparent from 
10 comparison of the structure of the wildtype probe with the 
three mutant probes. The corresponding nucleotide in the 
reference sequence is the "a" in the superscripted segment. 



15 



Reference: 5« T GGCTA CGAGG AATCATCTGTTA 



Probes: 3 1 GCTCC CCGAT (Probe from first probe set) 

3 1 GCACC CCGAT 

3 ■ GCCCC CCGAT 

20 3» GCGCC CCGAT 

The expected hybridizations are: 

Match: 

25 GCTCCCCG&E 

. . . TGGCTACGAGGAATCATCTGTTA 
GCTCC CCGAT 

Mismatch: 
30 GCTCC CCGAT 

. . . TGGCTACGAGGAATCATCTGTTA 
GCGCC CCGAT 

35 Bridge tilings are specified using a notation which gives 

the length of the two constituent segments and the relative 
position of the interrogation position. The designation n/m 
indicates a segment complementary to a region of the reference 
sequence which extends for n bases and is located such that 

40 the interrogation position is in the mth base from the 5 1 end. 
If m is larger than n, this indicates that the entire segment 
is to the 5 1 side of the interrogation position. If m is 
negative, it indicates that the interrogation position is the 
absolute value of m bases 5' of the first base of the segment 

45 (m cannot be zero). Probes comprising multiple segments, such 
as n/m + a/b + ... have a first segment at the 3' end of the 
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probe and additional segments added 5 1 with respect to the 
first segment. For example, a 4/8 tiling consists of (from 
the 3' end of the probe) a 4 base complementary segment, 
starting 7 bases 5 1 of the interrogation position, followed by 
5 a 6 base region in which the interrogation position is located 
at the third base. Between these two segments, one base from 
the reference sequence is omitted. By this notation, the set 
shown above is a 5/3 + 5/8 tiling. Many different tilings are 
possible with this method, since the lengths of both segments 

10 can be varied, as well as their relative position (they may be 
in either order and there may be a gap between them) and their 
location relative to the interrogation position. 

As an example, a 16 mer oligo target was hybridized to a 
chip containing all 4 10 probes of length 10. The chip 

15 includes short tilings of both standard and bridging types. 

The data from a standard 10/5 tiling was compared to data from 
a 5/3 + 5/8 bridge tiling (see Table 1). Probe intensities 
(mean count/pixel) are displayed along with discrimination 
ratios (correct probe intensity / highest incorrect probe 

20 intensity) . Missing intensity values are less than 50 counts. 
Note that for each base displayed the bridge tiling has a 
higher discrimination value. 



25 



30 



35 



45 



TABLE 1: Comparison of Standard and Bridge Tilings 



TILING 



STANDARD 
(10/5) 



DISCRIMINATION: 



BRIDGING 
40 5/3 + 5/8 



DISCRIMINATION: 



BASE: 


CORRECT 


PROBE 


BASE 




c 


A 


c 


C 


A 


92 


496 


294 


299 


c 


536 


148 


532 


534 


G 


69 


167 


72 


52 


T 


146 


95 


212 


126 




3.7 


3.0 


1.8 


1.8 


A 




404 




156 


c 


276 




345 


379 


G 




80 






T 








58 




>5.5 


5.1 


2.4 


1.26 



The bridging strategy offers the following advantages: 
(1) Higher discrimination between matched and mismatched 
probes, 
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(2) The possibility of using longer probes in a bridging 
tiling, thereby increasing the specificity of the 
hybridization, without sacrificing discrimination, 

(3) The use of probes in which an interrogation position 
5 is located very off-center relative to the regions of target 

complementarity. This may be of particular advantage when, 
for example, when a probe centered about one region of the 
target gives low hybridization signal. The low signal is 
overcome by using a probe centered about an adjoining region 
10 giving a higher hybridization signal. 

(4) Disruption of secondary structure that might result 
in annealing of certain probes (see previous discussion of 
helper mutations) . 

15 10. Deletion Tiling 

Deletion tiling is related to both the bridging and 
helper mutant strategies described above. In the deletion 
strategy, comparisons are performed between probes sharing a 
common deletion but differing from each other at an 

20 interrogation position located outside the deletion. For 

example, a first probe comprises first and second segments, 
each exactly complementary to respective first and second 
subsequences of a reference sequence, wherein the first and 
second subsequences of the reference sequence are separated by 

25 a short distance (e.g., 1 or 2 nucleotides). The order of the 
first and second segments in the probe is usually the same as 
that of the complement to the first and second subsequences in 
the reference sequence. The interrogation position is usually 
separated from The comparison is performed with three other 

30 probes, which are identical to the first probe except at an 
interrogation position, which is different in each probe. 
Reference: . . . AGTACCAGATCTCTAA . . . 

Probe set: CATGGNC AGAGA (N = interrogation position) . 

Such tilings sometimes offer superior discrimination in 
35 hybridization intensities between the probe having an 

interrogation position complementary to the target and other 
probes. Thermodynamically , the difference between the 
hybridizations to matched and mismatched targets for the probe 
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set shown above is the difference between a single-base bulge, 
and a large asymmetric loop (e.g., two bases of target, one of 
probe) . This often results in a larger difference in 
stability than the comparison of a perfectly matched probe 
5 with a probe showing a single base mismatch in the basic 
tiling strategy. 

The superior discrimination offered by deletion tiling is 
illustrated by Table 2, which compares hybridization data from 
a standard 10/5 tiling with a (4/8 + 6/3) deletion tiling of 

10 the reference sequence. (The numerators indicate the length 
of the segments and the denominators, the spacing of the 
deletion from the far termini of the segments.) Probe 
intensities (mean count/pixel) are displayed along with 
discrimination ratios (correct probe intensity / highest 

15 incorrect probe intensity) . Note that for each base displayed 
the deletion tiling has a higher discrimination value than 
either standard tiling shown. 

TABLE 2. Comparison of Standard and Deletion Tilings 



20 





TILING 


PROBE BASE: 


CORRECT 


PROBE 


BASE 








c 


A 


c 


C 


25 




A 


92 


496 


294 


299 




STANDARD 


C 


536 


148 


532 


534 




(10/5) 


G 


69 


167 


72 


52 






T 


146 


95 


212 


126 


30 


DISCRIMINATION: 




3.7 


3.0 


1.8 


1.8 






A 


6 


412 


29 


48 




DELETION 


C 


297 


32 


465 


160 




4/8 + 6/3 


G 


8 


77 


10 


4 


35 




T 


8 


26 


31 


5 




DISCRIMINATION: 




37.1 


5.4 


15 


3.3 






A 


347 


533 


228 


277 


40 


STANDARD 


C 


729 


194 


536 


496 




(10/7) 


G 


232 


231 


102 


89 






T 


344 


133 


163 


150 




DISCRIMINATION: 




2.1 


2.3 


2.3 


1.8 



45 



The use of deletion or bridging probes is quite general. 
These probes can be used in any of the tiling strategies of 
the invention* As well as offering superior discrimination, 
50 the use of deletion or bridging strategies is advantageous for 
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certain probes to avoid self -hybridization (either within a 
probe or between two probes of the same sequence) 

C. Preparation of Target Samples 

The target polynucleotide, whose sequence is to be 
determined, is usually isolated from a tissue sample. If the 
target is genomic, the sample may be from any tissue (except 
exclusively red blood cells). For example, whole blood, 
peripheral blood lymphocytes or PBMC, skin, hair or semen are 
convenient sources of clinical samples. These sources are 
also suitable if the target is RNA. Blood and other body 
fluids are also a convenient source for isolating viral 
nucleic acids. If the target is mRNA, the sample is obtained 
from a tissue in which the mRNA is expressed. If the 
polynucleotide in the sample is RNA, it is usually reverse 
transcribed to DNA. DNA samples or. cDNA resulting from 
reverse transcription are usually amplified, e.g., by PCR. 
Depending on the selection of primers and amplifying 
enzyme (s), the amplification product can be RNA or DNA. 
Paired primers are selected to flank the borders of a target 
polynucleotide of interest. More than one target can be 
simultaneously amplified by multiplex PCR in which multiple 
paired primers are employed. The target can be labelled at 
one or more nucleotides during or after amplification. For 
some target polynucleotides (depending on size of sample) , 
e.g., episomal DNA, sufficient DNA is present in the tissue 
sample to dispense with the amplification step. 

When the target strand is prepared in single-stranded 
form as in preparation of target RNA, the sense of the strand 
should of course be complementary to that of the probes on the 
chip. This is achieved by appropriate selection of primers. 
The target is preferably fragmented before application to the 
chip to reduce or eliminate the formation of secondary 
structures in the target. The average size of targets 
segments following hybridization is usually larger than the 
size of probe on the chip. 
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II". ILLUSTRATIVE CHIPS 
A. HIV Chip 

HIV has infected a large and expanding number of people, 
resulting in massive health care expenditures. HIV can 
5 rapidly become resistant to drugs used to treat the infection, 
primarily due to the action of the heterodimeric protein (51 
kDa and 66 kDa) HIV reverse transcriptase (RT) both subunits 
of which are encoded by the 1.7 kb pol gene. The high error 
rate (5-10 per round) of the RT protein is believed to account 

10 for the hypermutability of HIV. The nucleoside analogues, 
i.e., AZT, ddl, ddC, and d4T, commonly used to treat HIV 
infection are converted to nucleotide analogues by sequential 
phosphorylation in the cytoplasm of infected cells, where 
incorporation of the analogue into the viral DNA results in 

15 termination of viral replication, because the 5 1 -> 3 1 

phosphodiester linkage cannot be completed. However, after 
about 6 months to 1 year of treatment or less, HIV typically 
mutates the RT gene so as to become incapable of incorporating 
the analogue and so resistant to treatment. Several mutations 

20 known to be associated with drug resistance are shown in the 
table below. After a virus having drug resistance via a 
mutation becomes predominant, the patient suffers dramatically 
increased viral load, worsening symptoms (typically more 
frequent and dif f icult-to-treat infections) , and ultimately 

25 death. Switching to a different treatment regimen as soon as 
a resistant mutant virus takes hold may be an important step 
in patient management which prolongs patient life and reduces 
morbidity during life. 
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TABLE 3 ^ 
SOME RT MUTATIONS ASSOCIATED WITH DRUG RESISTANCE 



ANTIVIRAL 


CODON 


aa CHANGE 


nt CHANGE 


A2T 


67 


Asp -> Asn 


GAC -> AAC 


AZT 


70 


Lys -> Arg 


AAA -> AGA 


AZT 


215 


Thr -> Phe or Tyr 


ACC -> TTC or TAC 


AZT 


219 


Lys -> Gin or Glu 


AAA -> CAA or GAA 


AZT 


41 


Met -> Leu 


ATG -> TTG or CTG 


ddl and ddC 


184 


Met -> Val 


ATG -> GTG 


ddl and ddC 


74 


Leu -> Val 




TIBO 82150 


100 


Leu -> lie 




ddC 


65 


Lys -> Asn 


AAA -> AGA 


ddC 


69 


Thr -> Asp 


ACT -> GAT 


3TC 


184 


Met -> Val 


ATG — > GTG or GTA 


3TC 


184 


Met -> lie 


ATG -> ATA 


AZT + ddl 


62 


Ala -> Val 


GCC -> GTC 


AZT + ddl 


75 


Val -> lie 


GTA -> ATA 


AZT + ddl 


77 


Phe -> Leu 


TTC -> TTA 


AZT + ddl 


116 


Phe -> Tyn 


TTT -> TAT 


AZT + ddl 


151 


Gin -> Met 


CAG -> ATG 


Nevaripine 


103 


Lys -> Asn 


AAA -> AAT 




106 


Val -> Ala 


GTA -> GCA 




108 








181 


Tyr -> Cys 


TAT -> TGT 




188 


Tyr -> His 


TAT -> CAT 




190 


Gly -> Ala 


GGA -> GCA 



30 N.B.. Other mutations confer resistance to other drugs. 



A second important therapeutic target for anti-HIV drugs 
is the aspartyl protease enzyme encoded by the HIV genome, 
3 5 whose function is required for the formation of infectious 
progeny • See Robbins & Plattner, J. Acquired Immune 
Deficiency Syndromes 6, 162-170 (1993); Kozal et al., Curr. 
Op. Infect. Dis. 7:72-81 (1994). The protease function in 
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processing of viral precursor polypeptides to their active 
forms. Drugs targeted against this enzyme do not impair 
endogenous human proteases, thereby achieving a high degree of 
selective toxicity. Moreover, the protease is expressed later 
in the life-cycle that reverse transcriptase, thereby offering 
the possibility of a combined attack on HIV at two different 
times in its life-cycle. As for drugs targeted against the 
reverse transcriptase, administration of drugs to the protease 
can result in acquisition of drug resistance through mutation 
of the protease. By monitoring the protease gene from 
patients, it is possible to detect the occurrence of 
mutations, and thereby make appropriate adjustments in the 
drug(s) being administered. 

In addition to being infected with HIV, AIDS patients are 
often also infected with a wide variety of other infectious 
agents giving rise to a complex series of symptoms. Often 
diagnosis and treatment is difficult because many different 
pathogens (some life-threatening, others routine) cause 
similar symptoms. Some of these infections, so-called 
opportunistic infections, are caused by bacterial, fungal, 
protozoan or viral pathogens which are normally present in 
small quantity in the body, but are held in check by the 
immune system. When the immune system in AIDS patients fails, 
these normally latent pathogens can grow and generate rampant 
infection. In treating such patients, it would be desirable 
simultaneously to diagnose the presence or absence of a 
variety of the most lethal common infections, determine the 
most effective therapeutic regime against the HIV virus, and 
monitor the overall status of the patient's infection. 

The present invention provides DNA chips for detecting 
the multiple mutations in HIV genes associated with resistance 
to different therapeutics. These DNA chips allow physicians 
to monitor mutations over time and to change therapeutics if 
resistance develops. Some chips also provide probes for 
diagnosis of pathogenic microorganisms that typically occur in 
AIDS patients. 

The sequence selected as a reference sequence can be from 
anywhere in the HIV genome, but should preferably cover a 
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region of the HIV genome in which mutations associated with 
drug resistance are known to occur. A reference sequence is 
usually between about 5, 10, 20, 50, 100, 5000 , 1000, 5,000 or 
10,000 bases in length, and preferably is about 100-1700 bases 
in length. Some reference sequences encompass at least part 
of the reverse transcriptase sequence encoded by the pol gene. 
Preferably, the reference sequence encompasses all, or 
substantially all (i.e, about 75 or 90%) of the reverse 
transcriptase gene. Reverse transcriptase is the target of 
several drugs and as noted, above, the coding sequence is the 
site of many mutations associated with drug resistance. In 
some chips, the reference sequence contains the entire region 
coding reverse transcriptase (850 bp) , and in other chips, 
subfragments thereof. In some chips, the reference sequence 
includes other subfragments of the pol gene encoding HIV 
protease or endonuclease, instead of, or as well as the 
segment encoding reverse transcriptase. In some chips, the 
reference sequence also includes other HIV genes such as env 
or gag as well as or instead of the reverse transcriptase 
gene. Certain regions of the gag and env genes are relatively 
well conserved, and their detection provides a means for 
identifying and quantifying the amount of HIV virus infecting 
a patient. In some chips, the reference sequence comprises an 
entire HIV genome. 

It is not critical from which strain of HIV the reference 
sequence is obtained. HIV strains are classified as HIV-I, 
HIV-II or HIV-III, and within these generic groupings there 
are several strains and polymorphic variants of each of these. 
BRU, SF2, HXB2, HXB2R are examples of HIV-1 strains, the 
sequences of which are available from GenBank. The reverse 
transcriptase genes of the BRU and SF2 strains differ at 23 
nucleotides. The HXB2 and HXB2R strains have the same reverse 
transcriptase gene sequence, which differs from that of the 
BRU strain at four nucleotides, and that of SF2 by 27 
nucleotides. In some chips, the reference sequence 
corresponds exactly to the reverse transcriptase sequence in 
the wildtype version of a strain. In other chips, the 
reference sequence corresponds to a consensus sequence of 
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several HIV strains. In some chips , the reference sequence 
corresponds to a mutant form of. a HIV strain. 

Chips are designed in accordance with the tiling 
strategies noted above. The probes are designed to be 
complementary to either the coding or noncoding strand of the 
HIV reference sequence. If only one strand is to be read, it 
is preferable to read the coding strand. The greater 
percentage of A residues in this strand relative to the 
noncoding strand generally result in fewer regions of 
ambiguous sequence. 

Some chips contain additional probes or groups of probes 
designed to be complementary to a second reference sequence. 
The second reference sequence is often a subsequence of the 
first reference sequence bearing one or more commonly 
occurring HIV mutations or interstrain variations (e.g., 
within codons 67, 70, 215 or 219 of the reverse transcriptase 
gene) . The inclusion of a second group is particularly useful 
for analyzing short subsequences of the primary reference 
sequence in which multiple mutations are expected to occur 
within a short distance commensurate with the length of the 
probes (i.e., two or more mutations within 9 to 21 bases). 

The total number of probes on the chips depends on the 
tiling strategy, the length of the reference sequence and the 
options selected with respect to inclusion of multiple probe 
lengths and secondary groups of probes to provide confirmation 
of the existence of common mutations. To read much or all of 
the HIV reverse transcriptase gene (857 b for the BRU strain) , 
chips tiled by the basic strategy typically contain at least 
857 x 4 « 3428 probes. 

The target HIV polynucleotide, whose sequence is to be 
determined, is usually isolated from blood samples (peripheral 
blood lymphocytes or PBMC) in the form of RNA. The RNA is 
reverse transcribed to DNA, and the DNA product is then 
amplified. Depending on the selection of primers and 
amplifying enzyme, the amplification product can be RNA or 
DNA. Suitable primers for amplification of target are shown 
in the table below. 
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TABLE 4 
AMPLIFICATION- OF TARGET 



TARGET 
SIZE 


FORWARD PRIMER 


REVERSE PRIMER 


1,742 bp 


GTAGAATTCTGTTGACTCAGATTGG 


GATAAGCTTGGGCCTTATCTATTCCAT 


535 bp 


AAATCCATACAATACTCCAGTAITI GC 


ACCCATCCAAAGGAATGGAGGTTCTTTC 


323 bp 


Genbank 8 K02013 1889-1908 


bases 2211-2192 




AATTAACCCTCACTAAAGGGAga 
ggaagaatctgttgactcagattggt (RT01-T3) 


AA'I'I 1 AATACGACTCACTATAGGGAtacccca 
cttactlctgtatgtcattgaca-3 ' (89-391 T7) 




AATTAACCCTCACTAAAGGGAga 
agtatactgcattaccatacctagta (RT03-T3) 






TaaYacgactcactatagggaga 

tcgacgcaggactcggcctgctgaa (HV1-T2) 






AATTAACCCTCACTAAAGGGAGA 
ccttgtaagtcattggtcttaaaggta (HV2-T3) 





15 

In another aspect of the invention, chips are provided 
for simultaneous detection of HIV and microorganisms that 
commonly parasitize AIDS patients (e.g., cytomegalovirus 
(CMV) , Pneumocystis carini (PCP) , fungi (Candida albicans),. 

20 mycobacteria) . Non-HIV viral pathogens are detected and their 
drug resistance determined using a similar strategy as for 
HIV. That is groups of probes are designed to show 
complementarity to a target sequence from a region of the 
genome of a nonviral pathogen known to be associated with 

25 acquisition of drug resistance. For example, CMV and HSV 

viruses, which frequently co-parasitize AIDS patients, undergo 
mutations to acquire resistance to acyclovir. 

For detection of non-viral pathogens, the chips include 
an array of probes which allow full-sequence determination of 

30 16S ribosomal RNA or corresponding genomic DNA of the 

pathogens. The additional probes are designed by the same 
principles as described above except that the target sequence 
is a variable region from a 16S RNA (or corresponding DNA) of 
a pathogenic microorganism. Alternatively, the target 

35 sequence can be a consensus sequences of variable 16S rRNA 

regions from multiple organisms. 16S ribosomal DNA and RNA is 
present in all organisms (except viruses) and the sequence of 
the DNA or RNA is closely related to the evolutionary genetic 
distance between any two species. Hence, organisms which are 

40 quite close in type (e.g., all mycobacteria) share a common 



